RE: [PATCH] net: ftgmac100: Fix missing TX-poll issue

From: Dylan Hung
Date: Tue Oct 20 2020 - 02:15:05 EST


> -----Original Message-----
> From: Jakub Kicinski [mailto:kuba@xxxxxxxxxx]
> Sent: Tuesday, October 20, 2020 3:01 AM
> To: Joel Stanley <joel@xxxxxxxxx>
> Cc: Dylan Hung <dylan_hung@xxxxxxxxxxxxxx>; Benjamin Herrenschmidt
> <benh@xxxxxxxxxxxxxxxxxxx>; David S . Miller <davem@xxxxxxxxxxxxx>;
> netdev@xxxxxxxxxxxxxxx; Linux Kernel Mailing List
> <linux-kernel@xxxxxxxxxxxxxxx>; Po-Yu Chuang <ratbert@xxxxxxxxxxxxxxxx>;
> linux-aspeed <linux-aspeed@xxxxxxxxxxxxxxxx>; OpenBMC Maillist
> <openbmc@xxxxxxxxxxxxxxxx>; BMC-SW <BMC-SW@xxxxxxxxxxxxxx>
> Subject: Re: [PATCH] net: ftgmac100: Fix missing TX-poll issue
>
> On Mon, 19 Oct 2020 08:57:03 +0000 Joel Stanley wrote:
> > > diff --git a/drivers/net/ethernet/faraday/ftgmac100.c
> > > b/drivers/net/ethernet/faraday/ftgmac100.c
> > > index 00024dd41147..9a99a87f29f3 100644
> > > --- a/drivers/net/ethernet/faraday/ftgmac100.c
> > > +++ b/drivers/net/ethernet/faraday/ftgmac100.c
> > > @@ -804,7 +804,8 @@ static netdev_tx_t
> ftgmac100_hard_start_xmit(struct sk_buff *skb,
> > > * before setting the OWN bit on the first descriptor.
> > > */
> > > dma_wmb();
> > > - first->txdes0 = cpu_to_le32(f_ctl_stat);
> > > + WRITE_ONCE(first->txdes0, cpu_to_le32(f_ctl_stat));
> > > + READ_ONCE(first->txdes0);
> >
> > I understand what you're trying to do here, but I'm not sure that this
> > is the correct way to go about it.
> >
> > It does cause the compiler to produce a store and then a load.

Yes, the load instruction here is to guarantee the previous store is indeed pushed onto the physical memory.

>
> +1 @first is system memory from dma_alloc_coherent(), right?
>
> You shouldn't have to do this. Is coherent DMA memory broken on your
> platform?

It is about the arbitration on the DRAM controller. There are two queues in the dram controller, one is for the CPU access and the other is for the HW engines.
When CPU issues a store command, the dram controller just acknowledges cpu's request and pushes the request into the queue. Then CPU triggers the HW MAC engine, the HW engine starts to fetch the DMA memory.
But since the cpu's request may still stay in the queue, the HW engine may fetch the wrong data.