Re: [PATCH v2 0/6] Rework REFCOUNT_FULL using atomic_fetch_* operations

From: Kees Cook
Date: Wed Aug 28 2019 - 17:03:45 EST


On Wed, Aug 28, 2019 at 03:14:40PM +0100, Will Deacon wrote:
> On Wed, Aug 28, 2019 at 09:30:52AM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 27, 2019 at 05:31:58PM +0100, Will Deacon wrote:
> > > Will Deacon (6):
> > > lib/refcount: Define constants for saturation and max refcount values
> > > lib/refcount: Ensure integer operands are treated as signed
> > > lib/refcount: Remove unused refcount_*_checked() variants
> > > lib/refcount: Move bulk of REFCOUNT_FULL implementation into header
> > > lib/refcount: Improve performance of generic REFCOUNT_FULL code
> > > lib/refcount: Consolidate REFCOUNT_{MAX,SATURATED} definitions

BTW, can you repeat the timing details into the "Improve performance of
generic REFCOUNT_FULL code" patch?

> > So I'm not a fan; I itch at the whole racy nature of this thing and I
> > find the code less than obvious. Yet, I have to agree it is exceedingly
> > unlikely the race will ever actually happen, I just don't want to be the
> > one having to debug it.
>
> FWIW, I think much the same about the version under arch/x86 ;)
>
> > I've not looked at the implementation much; does it do all the same
> > checks the FULL one does? The x86-asm one misses a few iirc, so if this
> > is similarly fast but has all the checks, it is in fact better.
>
> Yes, it passes all of the REFCOUNT_* tests in lkdtm [1] so I agree that
> it's an improvement over the asm version.
>
> > Can't we make this a default !FULL implementation?
>
> My concern with doing that is I think it would make the FULL implementation
> entirely pointless. I can't see anybody using it, and it would only exist
> as an academic exercise in handling the theoretical races. That's a change
> from the current situation where it genuinely handles cases which the
> x86-specific code does not and, judging by the Kconfig text, that's the
> only reason for its existence.

Looking at timing details, the new implementation is close enough to the
x86 asm version that I would be fine to drop the x86-specific case
entirely as long as we could drop "FULL" entirely too -- we'd have _one_
refcount_t implementation: it would be both complete and fast.

However, I do think a defconfig image size comparison should be done as
part of that too. I think this implementation will be larger than the
x86 asm one (but not by any amount that I think is a problem).

I'd also note that the saturation speed is likely faster in this
implementation (i.e. the number of instructions between noticing the
wrap and setting the saturation value), as it is on the other side of
a branch instead of across a trap, trap handler lookup, and call. So
the race window should even be smaller (though I continue to think it
remains hard enough to hit as to make it a non-issue in all cases: if
you can schedule INT_MAX / 2 increments before a handful of instructions
resets it to INT_MAX / 2, I suspect there are much larger problems. :)

--
Kees Cook