Re: [PATCH v2] netlink: Replace rhash_portid with bound

From: Herbert Xu
Date: Wed Sep 23 2015 - 02:14:34 EST


On Tue, Sep 22, 2015 at 12:10:56PM -0400, Tejun Heo wrote:
>
> That's a pentium pro era errata. Virtually no working machine is
> affected by that anymore and nobody builds kernel with that option.
> In most cases, store_release and load_acquire are cheaper as they're
> more specific. On x86, store_release and load_acquire boil down to
> compiler reordering barriers. You're running in the opposite
> direction.

It doesn't matter on x86 but the semantics of smp_load_acquire
and smp_store_release explicitly includes a full barrier which
is something that we don't need.

> I mean, read this thread. It's one subtle breakage after another, one
> confusion after another. The problematic usages of memory barriers
> are usually of two types.

smp_load_acquire/store_release isn't some kind of magic dust
that you can just spread around to fix races. Whoever is writing
the code had better understood what the code is doing or they will
end up creating more bugs.

Having said that, I very much appreciate the work you have done
in reviewing my patches and pointing out holes in them. Please
continue to do so and I will look at any real issues that you
discover.

> 1. Misunderstand what barriers do and fail to, most frequently, pair
> the barriers correctly. This leads to things like lone smp_wmb()s
> which don't do anything but provide false sense of security.

smp_store_release can give you exactly the same kind of false
sense of security.

> 2. The usage itself is correct but not properly documented and it's
> not clear what's being synchronized. Because there's nothing
> inherently pairing the matching barrier pairs, w/o proper
> documentation, it can be very challenging to track down what is
> being synchronized making it difficult to tell this case from 1.
> Note that this is another reason smp_store_release() and
> smp_load_acquire() are just better. Their specificity not only
> makes them lighter but also makes it a lot easier to track down
> what's going on.

Well if there is anything unclear in my patch please point them out
and I will rewrite and/or add comments where necessary.

> > The reason this one is OK is because we do not use nlk->portid or
> > try to get nlk from the hash table before we return to user-space.
>
> What happens if somebody later adds code below that which happens to
> use portid? You're creating a booby trap and the patch isn't even
> properly documenting what's going on.

The same thing can still happen even if you use smp_load_acquire.
Someone may add a read prior to it or add a second smp_load_acquire
and then screw up the logic as we have seen in netlink_bind.

As I said whoever is touching these lockless code paths need to
understand what is going on. There is no easy way out.

Cheers,
--
Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/