Re: [PATCH v3 10/15] net: ethernet: mtk-eth-mac: new driver

From: Bartosz Golaszewski
Date: Mon May 18 2020 - 10:07:36 EST


pt., 15 maj 2020 o 15:32 Arnd Bergmann <arnd@xxxxxxxx> napisaÅ(a):
>
> On Thu, May 14, 2020 at 10:00 AM Bartosz Golaszewski <brgl@xxxxxxxx> wrote:
> > +static int mtk_mac_ring_pop_tail(struct mtk_mac_ring *ring,
> > + struct mtk_mac_ring_desc_data *desc_data)
>
> I took another look at this function because of your comment on the locking
> the descriptor updates, which seemed suspicious as the device side does not
> actually use the locks to access them
>
> > +{
> > + struct mtk_mac_ring_desc *desc = &ring->descs[ring->tail];
> > + unsigned int status;
> > +
> > + /* Let the device release the descriptor. */
> > + dma_rmb();
> > + status = desc->status;
> > + if (!(status & MTK_MAC_DESC_BIT_COWN))
> > + return -1;
>
> The dma_rmb() seems odd here, as I don't see which prior read
> is being protected by this.
>
> > + desc_data->len = status & MTK_MAC_DESC_MSK_LEN;
> > + desc_data->flags = status & ~MTK_MAC_DESC_MSK_LEN;
> > + desc_data->dma_addr = ring->dma_addrs[ring->tail];
> > + desc_data->skb = ring->skbs[ring->tail];
> > +
> > + desc->data_ptr = 0;
> > + desc->status = MTK_MAC_DESC_BIT_COWN;
> > + if (status & MTK_MAC_DESC_BIT_EOR)
> > + desc->status |= MTK_MAC_DESC_BIT_EOR;
> > +
> > + /* Flush writes to descriptor memory. */
> > + dma_wmb();
>
> The comment and the barrier here seem odd as well. I would have expected
> a barrier after the update to the data pointer, and only a single store
> but no read of the status flag instead of the read-modify-write,
> something like
>
> desc->data_ptr = 0;
> dma_wmb(); /* make pointer update visible before status update */
> desc->status = MTK_MAC_DESC_BIT_COWN | (status & MTK_MAC_DESC_BIT_EOR);
>
> > + ring->tail = (ring->tail + 1) % MTK_MAC_RING_NUM_DESCS;
> > + ring->count--;
>
> I would get rid of the 'count' here, as it duplicates the information
> that is already known from the difference between head and tail, and you
> can't update it atomically without holding a lock around the access to
> the ring. The way I'd do this is to have the head and tail pointers
> in separate cache lines, and then use READ_ONCE/WRITE_ONCE
> and smp barriers to access them, with each one updated on one
> thread but read by the other.
>

Your previous solution seems much more reliable though. For instance
in the above: when we're doing the TX cleanup (we got the TX ready
irq, we're iterating over descriptors until we know there are no more
packets scheduled (count == 0) or we encounter one that's still owned
by DMA), a parallel TX path can schedule new packets to be sent and I
don't see how we can atomically check the count (understood as a
difference between tail and head) and run a new iteration (where we'd
modify the head or tail) without risking the other path getting in the
way. We'd have to always check the descriptor.

I experimented a bit with this and couldn't come up with anything that
would pass any stress test.

On the other hand: spin_lock_bh() works fine and I like your approach
from the previous e-mail - except for the work for updating stats as
we could potentially lose some stats when we're updating in process
context with RX/TX paths running in parallel in napi context but that
would be rare enough to overlook it.

I hope v4 will be good enough even with spinlocks. :)

Bart