Re: [PATCH 0/5] v2: block subsystem refcounter conversions

From: James Bottomley
Date: Fri Apr 21 2017 - 18:01:39 EST


On Fri, 2017-04-21 at 14:30 -0700, Kees Cook wrote:
> On Fri, Apr 21, 2017 at 2:27 PM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > On Fri, 2017-04-21 at 13:22 -0700, Kees Cook wrote:
> > > On Fri, Apr 21, 2017 at 12:55 PM, Eric Biggers <
> > > ebiggers3@xxxxxxxxx>
> > > wrote:
> > > > > > Of course, having extra checks behind a debug option is
> > > > > > fine. But they should not be part of the base feature; the
> > > > > > base feature should just be mitigation of reference count
> > > > > > *overflows*. It would be nice to do more, of course; but
> > > > > > when the extra stuff prevents people from using refcount_t
> > > > > > for performance reasons, it defeats the point of the
> > > > > > feature in the first place.
> > > > >
> > > > > Sure, but as I said above, I think the smaller tricks and
> > > > > fixes won't be convincing enough, so their value is
> > > > > questionable.
> > > >
> > > > This makes no sense, as the main point of the feature is
> > > > supposed to be the security improvement. As-is, the extra
> > > > debugging stuff is actually preventing the security improvement
> > > > from being adopted, which is unfortunate.
> > >
> > > We've been trying to handle the conflicting desires of those
> > > wanting very precise refcounting implementation and gaining the
> > > security protections. Ultimately, the best way forward seemed to
> > > be to first land the precise refcounting implementation, and
> > > start conversion until we ran into concerns over performance.
> > > Now, since we're here, we can move forward with getting a fast
> > > implementation that provides the desired security protections
> > > without too greatly messing with the refcount API.
> >
> > But that's not what it really looks like. What it looks like is
> > someone came up with a new API and is now intent on forcing it
> > through the kernel in about 500 patches using security as the
> > hammer.
>
> The intent is to split refcounting and statistical counters away from
> atomics, since they have distinct APIs. This will let us audit the
> remaining atomic uses much more easily.

But the security problem is counter overflow, right? That problem, as
far as I can see exists in the atomics as well. I'm sure there might
be one or two corner cases depending on overflow actually working, but
I can't think of them.

The refcount vs atomic comes on the precise meaning of decrement to
zero. I'm not saying there's no benefit to detecting the condition,
but the security problem looks to be much more pressing which is why I
think this can be argued on the merits later.

> > If we were really concerned about security first, we'd have fixed
> > the atomic overflow problem in the atomics and then built the
> > precise refcounting on that and tried to persuade people of the
> > merits.
>
> I agree, but this approach was NAKed by multiple atomics maintainers.

Overriding that decision by trying to convince all the consumers to
move to a new API doesn't seem to be going so well either. Perhaps we
could assist you in changing the minds of the atomics maintainers ...
what was the primary problem? The additional couple of cycles or the
fact that some use cases want overflow (or something else)?

James

> > Why can't we still do this? It looks like the overflow protection
> > will add only a couple of cycles to the atomics. We can move the
> > current version to an unchecked variant which can be used either in
> > truly performance critical regions with no security implications or
> > if someone really needs the atomic to overflow. From my point of
> > view it would give us the 90% case (security) and stop this endless
> > argument over the fast paths. Subsystems which have already moved
> > to refcount would stay there and the rest get to evaluate a
> > migration on the merits of the operational utility.
>
> -Kees
>