Re: can: isotp: epoll breaks isotp_sendmsg
From: Lukas Magel
Date: Fri Aug 18 2023 - 07:57:29 EST
On 13.08.23 13:23, Lukas Magel wrote:
> Hi Maxime, hi Michal,
>
> On 01.07.23 00:35, Michal Sojka wrote:
>> Hi Maxime,
>>
>> On Fri, Jun 30 2023, Maxime Jayat wrote:
>>> Hi,
>>>
>>> There is something not clear happening with the non-blocking behavior
>>> of ISO-TP sockets in the TX path, but more importantly, using epoll now
>>> completely breaks isotp_sendmsg.
>>> I believe it is related to
>>> 79e19fa79c ("can: isotp: isotp_ops: fix poll() to not report false
>>> EPOLLOUT events"),
>>> but actually is probably deeper than that.
>>>
>>> I don't completely understand what is exactly going on, so I am sharing
>>> the problem I face:
>>>
>>> With an ISO-TP socket in non-blocking mode, using epoll seems to make
>>> isotp_sendmsg always return -EAGAIN.
>> That's definitely not expected behavior. I tested the patch only with
>> poll, hoping that epoll would behave the same.
>>
>> [...]
>
> I am writing to report that we have been witnessing a behavior very similar
> to what you describe. ISO-TP send breaks with EAGAIN if a poll (for a read)
> occurs at the same time.
>
> Our Python stack uses two threads to do concurrent, blocking reads & writes
> from and to the ISO-TP socket. The socket has a timeout of 0.1s to facilitate
> shutdown when requested by the application. Notably, the blocking semantics
> are handled by CPython, i.e. the underlying kernel socket is non-blocking.
> CPython polls until the requested operation (read or write) can be executed
> or the timeout occurs.
>
> What happens during execution is that the socket is continuously being
> polled by the read thread, i.e. so->wait is always filled with one task.
> This process repeats until the socket receives a frame from the bus and the
> poll returns successsfully. The app reads the data from the socket and
> sends a response. Since the send occurs in a different thread, the reader
> thread will have already returned to its poll loop and to populating
> so->wait. When the send occurs, isotp_sendmsg checks so->wait for sleepers
> and returns EAGAIN because the socket is non-blocking although there is no
> concurrent send operation. This dance continues until the timeout occurs for
> either the read or the write operation. If the write times out first, a
> timeout error causes the Python app to break. If the read times out first,
> there is a race that the write goes through or the reader puts in
> another poll.
>
> This behavior can be seen rather nicely in strace:
> 110580 poll([{fd=5, events=POLLIN}], 1, 100) = 0 (Timeout)
> 110580 poll([{fd=5, events=POLLIN}], 1, 100) = 0 (Timeout)
> 110580 poll([{fd=5, events=POLLIN}], 1, 100) = 1 ([{fd=5, revents=POLLIN}])
> 110580 recvfrom(5, ">\0", 4095, 0, NULL, NULL) = 2
> 110580 poll([{fd=5, events=POLLIN}], 1, 100 <unfinished ...>
> 110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
> 110569 sendto(5, "~\0", 2, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> 110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
> 110569 sendto(5, "~\0", 2, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> 110569 poll([{fd=5, events=POLLOUT}], 1, 100) = 1 ([{fd=5, revents=POLLOUT}])
> 110569 sendto(5, "~\0", 2, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> [....]
> 110569 poll([{fd=5, events=POLLOUT}], 1, 6) = 1 ([{fd=5, revents=POLLOUT}])
> 110569 sendto(5, "~\0", 2, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> 110569 poll([{fd=5, events=POLLOUT}], 1, 6 <unfinished ...>
> 110580 <... poll resumed>) = 0 (Timeout)
> 110569 <... poll resumed>) = 1 ([{fd=5, revents=POLLOUT}])
> 110580 poll([{fd=5, events=POLLIN}], 1, 100 <unfinished ...>
> 110569 sendto(5, "~\0", 2, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
> 110569 poll([{fd=5, events=POLLOUT}], 1, 5) = 1 ([{fd=5, revents=POLLOUT}])
> 110569 sendto(5, "~\0", 2, 0, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable)
>
> I believe this is consistent to the behavior you're witnessing with epoll
> because epoll also places a sleeper in so->wait that is left in the queue
> until the epoll descriptor is closed.
>
>
>>> By reverting 79e19fa79c, I get better results but still incorrect:
>> [...]
>>
>>> It is then possible to write on the socket but the write is blocking,
>>> which is not the expected behavior for a non-blocking socket.
>> Yes, incorrect behavior was why we made the commit in question, however
>> we saw write() returning -EAGAIN when it shouldn't.
>>
>>> I don't know how to solve the problem. To me, using wq_has_sleeper seems
>>> weird.
>> Agreed. I've never tried to understand how synchronization works here.
>> Hopefully, Oliver knows more.
>>
>>> The implementation of isotp_poll feels weird too (calling both
>>> datagram_poll and
>>> poll_wait?). But I am not sure what would be the correct
>>> implementation.
>> I understand it as follows (which might be wrong - someone, please
>> correct me), isotp_poll() should register the file with all waitqueues
>> it can wait on. so->wait is one and sock->sq.wait (used by
>> datagram_poll) is another. The former is definitely used for TX, the
>> latter is probably used because skb_recv_datagram() is called for RX.
>> But so->wait is also used for RX and there might proabbly be be some
>> inconsistency between those.
>
> AFAIK, it is correct behavior for isotp_poll to register all wait queues
> with the poller.
>
> Before 79e19fa79c, I assume datagram_poll always returned
> EPOLLOUT because its corresponding send buffer is unused and empty. This
> return value can be incorrect if so->tx.state is not IDLE and a send would
> block. With the patch, this behavior is now suppressed. I believe that
> the inconsistency could have rather been introduced with:
> can: isotp: fix race between isotp_sendsmg() and isotp_release() 0517374
>
> With this patch, the behavior of isotp_sendmsg was changed to only check
> so->wait for sleepers instead of consulting so->tx.state to see if the
> socket is busy. Since the wait queue can also have sleepers only interested
> in read operations, I believe the return value is not a valid indicator
> of send readiness. Additionally, in this state, the behavior in isotp_sendmsg
> is inconsistent with isotp_poll. I will try to test next week if reverting this
> part of the patch could fix the race condition.
>
I submitted a patch to the mailing list and would very much
appreciate your feedback:
https://lore.kernel.org/linux-can/20230818114345.142983-1-lukas.magel@xxxxxxxxxx/
@Maxime, I tried your isotprecv / isotpsend snippet and the issue did
not occur with the supplied patch. The send operation went through right
away.
@Oliver I adjusted the exit path for the case where the initial wait is
interrupted to return immediately instead of jumping to err_event_drop.
Could you please check if you would agree with this change?
Regards,
Lukas