Re: [lkp-robot] [x86/refcount] b631e535c6: WARNING:at_net/netlink/af_netlink.c:#netlink_sock_destruct

From: Kees Cook
Date: Tue Jul 25 2017 - 14:38:35 EST


On Tue, Jul 25, 2017 at 3:43 AM, Hans Liljestrand
<liljestrandh@xxxxxxxxx> wrote:
> On Mon, Jul 24, 2017 at 08:21:16PM -0700, Kees Cook wrote:
>>
>> On Mon, Jul 24, 2017 at 6:03 AM, Hans Liljestrand
>> <liljestrandh@xxxxxxxxx> wrote:
>>>
>>> On Sun, Jul 23, 2017 at 08:52:53PM -0700, Kees Cook wrote:
>>>>
>>>>
>>>> Is 14afee4b6092f ("net: convert sock.sk_wmem_alloc from atomic_t to
>>>> refcount_t") correct? That looks like a statistics counter, not a
>>>> refcounter? I can't quite tell, though...
>>>
>>>
>>>
>>> Hmm, yes, it looks a bit weird, but it is used in a refcount fashion
>>> here:
>>>
>>> void sk_free(struct sock *sk)
>>> {
>>> /*
>>> * We subtract one from sk_wmem_alloc and can know if
>>> * some packets are still in some tx queue.
>>> * If not null, sock_wfree() will call __sk_free(sk) later
>>> */
>>> if (refcount_dec_and_test(&sk->sk_wmem_alloc))
>>> __sk_free(sk);
>>> }
>>>
>>> http://elixir.free-electrons.com/linux/v4.13-rc1/source/net/core/sock.c#L1605
>>
>>
>> Ah yeah, there it is. Hrmpf. Something is triggering WARNs, though...
>> I wonder if this can get examined more closely?
>
>
> I tried reproducing the error but I don't seem to know how to use lkp. Got
> lots of permission denied errors and finally ran out of disk space (after
> using up ~50GB).
>
> Maybe I did something wrong?
>
> What I did was: Cloned the related kernel repository, checked out offending
> commit, plopped in config, compiled bzImage. Then I just cloned the lkp repo
> and tried running the provided command line with the bzImage and provided
> script.
>
> I'll take another look once I have the time, might be I missed something
> earlier.

Yeah, I'm not sure. Seems it was found through trinity? And only after
36 seconds, too.

>> Also, why not atomic->refcount for sk_rmem_alloc?
>
> I couldn't find any similar refcount-like use on sk_rmem_alloc.

Okay, interesting.

> And as noted the sk_wmem_alloc thing is also a bit dubious. It looks like it
> serves a dual purpose of actual allocation size and occasional reference
> counter.

Could you ask net-dev to see what is actually happening here? This
looks like a regression, but also very odd (broken?) refcounting ...

-Kees


--
Kees Cook
Pixel Security