Re: [PATCH 1/7] can: rx-offload: continue on error

From: Jeroen Hofstee
Date: Sun Oct 13 2019 - 17:27:33 EST


Hello Marc,

On 10/9/19 3:18 PM, Marc Kleine-Budde wrote:
> Hello Jeroen,
>
> I'm currently looking at the rx-offload error handling in detail.
>
> TLDR: I've added the patch to linux-can.
>
> Thanks,
> Marc
>
> For the record, the details:
>
> On 9/24/19 8:45 PM, Jeroen Hofstee wrote:
>> While can_rx_offload_offload_one will call mailbox_read to discard
>> the mailbox in case of an error,
> Yes.
>
> can_rx_offload_offload_one() will read into the discard mailbox in case
> of an error.
>
> Currently there are two kinds of errors:
> 1) the rx-offoad skb queue (between the IRQ handler and the NAPI)
> is full
> 2) alloc_can_skb() fails to allocate a skb, due to OOM
>
>> can_rx_offload_irq_offload_timestamp bails out in the error case.
> Yes, in case of an error both can_rx_offload_irq_offload_timestamp() and
> can_rx_offload_irq_offload_fifo() will stop reading mailboxes, add the
> already filled skbs to the queue and schedule NAPI if needed.
>
> Currently both can_rx_offload_irq_offload_timestamp() and
> can_rx_offload_irq_offload_fifo() will return the number of queued
> mailboxes.
>
> This means in case of queue overflow or OOM, only one mailbox is
> discaded, before can_rx_offload_irq_offload_*() returning the number of
> successfully queued mailboxes so far.
>
> Looking at the flexcan driver:
>
> https://elixir.bootlin.com/linux/latest/source/drivers/net/can/flexcan.c#L867
>
>> while ((reg_iflag = flexcan_read_reg_iflag_rx(priv))) {
>> handled = IRQ_HANDLED;
>> ret = can_rx_offload_irq_offload_timestamp(&priv->offload,
>> reg_iflag);
>> if (!ret)
>> break;
>> }
> [...]
>> if (reg_iflag1 & FLEXCAN_IFLAG_RX_FIFO_AVAILABLE) {
>> handled = IRQ_HANDLED;
>> can_rx_offload_irq_offload_fifo(&priv->offload);
>> }
> This means for the timestamp mode, at least one mailbox is discarded or
> if the error occurred after reading one or more mailboxes the while loop
> will try again. If the error persists a second mailbox is discarded.
>
> For the fifo mode, only one mailbox is discarded.
>
> Then the flexcan's IRQ is exited. If there are messages in mailboxes are
> pending another IRQ is triggered.... I doubt that this is a good idea.
>
> On the other hand the ti_hecc driver:
>
>> /* offload RX mailboxes and let NAPI deliver them */
>> while ((rx_pending = hecc_read(priv, HECC_CANRMP))) {
>> can_rx_offload_irq_offload_timestamp(&priv->offload,
>> rx_pending);
>> hecc_write(priv, HECC_CANRMP, rx_pending);
>> }
> The error is ignored and the _all_ mailboxes are discarded (given the
> error persists).
>
>> Since it is typically called from a while loop, all message will
>> eventually be discarded. So lets continue on error instead to discard
>> them directly.
> After reading my own code and writing it up, your patch totally makes sense.
>
> If there is a shortage of resources, queue overrun or OOM, it makes no
> sense to return from the IRQ handler, if a mailbox is still active as it
> will trigger the IRQ again. Entering the IRQ handler again probably
> doesn't give the system time to recover from the resource problem.


Indeed, I have nothing to comment on that, besides thanks for
being willing to reconsider your own code.

With kind regards,

Jeroen