RE: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc

From: Liming Sun
Date: Wed Mar 06 2019 - 15:00:45 EST


Thanks Andy! Please see my response below. If no further comments, I'll try to post v11 after more testing.

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@xxxxxxxxx>
> Sent: Tuesday, March 5, 2019 10:34 AM
> To: Liming Sun <lsun@xxxxxxxxxxxx>
> Cc: David Woods <dwoods@xxxxxxxxxxxx>; Andy Shevchenko <andy@xxxxxxxxxxxxx>; Darren Hart <dvhart@xxxxxxxxxxxxx>; Vadim
> Pasternak <vadimp@xxxxxxxxxxxx>; Linux Kernel Mailing List <linux-kernel@xxxxxxxxxxxxxxx>; Platform Driver <platform-driver-
> x86@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
>
> On Thu, Feb 28, 2019 at 5:51 PM Liming Sun <lsun@xxxxxxxxxxxx> wrote:
> >
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
>
> Thank you for an update.
>
> Unfortunately more work is needed. My comments below.
>
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK GENMASK(8, 0)
>
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK GENMASK(7, 0)
>
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK GENMASK(15, 8)
>
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK GENMASK_ULL(40, 32)
>
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK GENMASK(8, 0)
>
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK GENMASK(7, 0)
>
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK GENMASK(15, 8)
>
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK GENMASK_ULL(40, 32)
>
> Since two of them have _ULL suffix I'm wondering if you have checked
> for side effects on the rest, i.e. if you operate with 64-bit variable
> and use something like ~MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK, it may
> give you interesting results.

The running system on the SoC is arm64 where BITS_PER_LONG and
BITS_PER_LONG_LONG have the same value. In such case, the two macros appears
to be the same. But you're right, I should use GENMASK_ULL() to be consistent
and more correctly, just in case the "CONFIG_64BIT" is not defined somehow.

Will update it in v11.

>
> > +#define MLXBF_TMFIFO_TIMER_INTERVAL (HZ / 10)
>
> > +/**
> > + * mlxbf_tmfifo_u64 - Union of 64-bit data
> > + * @data - 64-bit data in host byte order
> > + * @data_le - 64-bit data in little-endian byte order
> > + *
> > + * It's expected to send 64-bit little-endian value (__le64) into the TmFifo.
> > + * readq() and writeq() expect u64 instead. A union structure is used here
> > + * to workaround the explicit casting usage like writeq(*(u64 *)&data_le).
> > + */
>
> How do you know what readq()/writeq() does with the data? Is it on all
> architectures?
> How the endianess conversion affects the actual data?

The SoC runs arm64 and supports little endian for now. The FIFO has two sides,
one side is the SoC, the other side is extern host machine which could
access the FIFO via USB or PCIe. The rule is that the 'byte stream' will
keep the same when one side write 8 bytes and the other side reads
the 8 bytes. So as long as both sides have agreement on the byte-order
it should be fine.

After double check the arm64 readq()/writeq() implementation, it appears
that these APIs already does cpu_to_le64() and le64_to_cpu()
conversion. There's actually no need to make another conversion
(and shouldn't do it). I'll remove these conversions in v11. The code will
look much cleaner.

>
> > +union mlxbf_tmfifo_u64 {
> > + u64 data;
> > + __le64 data_le;
> > +};
>
> > +/*
> > + * Default MAC.
> > + * This MAC address will be read from EFI persistent variable if configured.
> > + * It can also be reconfigured with standard Linux tools.
> > + */
> > +static u8 mlxbf_tmfifo_net_default_mac[6] = {
> > + 0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
>
> > +#define mlxbf_vdev_to_tmfifo(dev) \
> > + container_of(dev, struct mlxbf_tmfifo_vdev, vdev)
>
> One line?

Couldn't fit it into one line within 80 characters.
(Please correct me if you meant single line even exceeding 80 chracters).

>
> > +/* Return the consumed Tx buffer space. */
> > +static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *tm_vdev)
> > +{
> > + int len;
> > +
> > + if (tm_vdev->tx_tail >= tm_vdev->tx_head)
> > + len = tm_vdev->tx_tail - tm_vdev->tx_head;
> > + else
> > + len = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - tm_vdev->tx_head +
> > + tm_vdev->tx_tail;
> > + return len;
> > +}
>
> Is this custom implementation of some kind of circ_buf?

Yes. I'll try if I could re-use the circ_buf structure and update it in v11.

>
> > +/* Allocate vrings for the fifo. */
> > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > + struct mlxbf_tmfifo_vdev *tm_vdev)
> > +{
> > + struct mlxbf_tmfifo_vring *vring;
> > + struct device *dev;
> > + dma_addr_t dma;
> > + int i, size;
> > + void *va;
> > +
> > + for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > + vring = &tm_vdev->vrings[i];
> > + vring->fifo = fifo;
> > + vring->num = MLXBF_TMFIFO_VRING_SIZE;
> > + vring->align = SMP_CACHE_BYTES;
> > + vring->index = i;
> > + vring->vdev_id = tm_vdev->vdev.id.device;
> > + dev = &tm_vdev->vdev.dev;
> > +
> > + size = vring_size(vring->num, vring->align);
> > + va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> > + if (!va) {
>
> > + dev_err(dev->parent, "dma_alloc_coherent failed\n");
>
> And how do you clean previously allocated items?

Fixed. Check the return value of mlxbf_tmfifo_alloc_vrings() and goto
'register_fail' (probably change to a better name) instead of 'fail'.
In such case the mlxbf_tmfifo_free_vrings() will be called to clean up
all allocated vrings.

>
> > + return -ENOMEM;
> > + }
> > +
> > + vring->va = va;
> > + vring->dma = dma;
> > + }
> > +
> > + return 0;
> > +}
>
> > +/* Disable interrupts of the fifo device. */
> > +static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
> > +{
> > + int i, irq;
> > +
> > + for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
> > + irq = fifo->irq_info[i].irq;
>
> > + if (irq) {
>
> I don't think this check is needed if you can guarantee that it has no
> staled records.

Yes, it's not needed any more according to the current code.
Will remove it in v11.

>
> > + fifo->irq_info[i].irq = 0;
> > + disable_irq(irq);
> > + }
> > + }
> > +}
>
> > +/* Get the number of available words in the TmFifo for sending. */
> > +static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
> > +{
> > + int tx_reserve;
> > + u64 sts;
> > +
> > + /* Reserve some room in FIFO for console messages. */
> > + if (vdev_id == VIRTIO_ID_NET)
> > + tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
> > + else
> > + tx_reserve = 1;
> > +
> > + sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
>
> > + return (fifo->tx_fifo_size - tx_reserve -
> > + FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts));
>
> Redundant parens.
> Moreover, consider
>
> u32 count; // or whatever suits for FIELD_GET().
> ...
>
> sts = readq(...);
> count = FIELD_GET(...);
> return ...;

Will update in v11.

>
> > +}
>
> > + while (size > 0) {
> > + addr = cons->tx_buf + cons->tx_head;
> > +
> > + if (cons->tx_head + sizeof(u64) <=
> > + MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> > + memcpy(&data, addr, sizeof(u64));
> > + } else {
> > + partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
> > + memcpy(&data, addr, partial);
>
> > + memcpy((u8 *)&data + partial, cons->tx_buf,
> > + sizeof(u64) - partial);
>
> Unaligned access?!

The code here is to build and copy 8 bytes from the buffer into the 'data'
variable. The source could be unaligned. For example, 3 bytes are at the
end of the buffer and 5 bytes are at the beginning of the buffer. memcpy()
is used to do byte-stream copy which seems ok. Please correct me if
I misunderstand the comment.

>
> > + }
>
> > + buf.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
> > + buf.data = le64_to_cpu(buf.data_le);
>
> Are you sure this is correct?
> How did you test this on BE architectures?

Thanks for the comment! Same as above, the conversion is not really needed.
I'll remove them in v11. As for testing, we only have arm64 little-endian Linux
running on the SoC. This conversion doesn't make much difference for the SoC.
As for BE architecture, we mainly verify the other side of the FIFO, which is the
external host like using ppc64.

>
> > + tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
>
> Is it appropriate use of devm_* ?

This is SoC, the device won't be closed or detached. The only case is when
the driver is unloaded. So it appears ok to use devm_kzalloc() since it's
allocated during probe() and released during module unload . Please
correct me if I misunderstand it.

>
> > + if (!tm_vdev) {
> > + ret = -ENOMEM;
> > + goto fail;
> > + }
>
> > +/* Read the configured network MAC address from efi variable. */
> > +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> > +{
> > + efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> > + efi_status_t status;
> > + unsigned long size;
>
> > + u8 buf[6];
>
> ETH_ALEN ?

Will update it in v11

>
> > +
> > + size = sizeof(buf);
>
> Ditto.

Will update it in v11

>
> > + status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > + buf);
>
> > + if (status == EFI_SUCCESS && size == sizeof(buf))
>
> Ditto.

Will update it in v11

>
> > + memcpy(mac, buf, sizeof(buf));
>
> ether_addr_copy().

Will update it in v11

>
> > +}
>
> > + memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);
>
> ether_addr_copy()...
>
> > + mlxbf_tmfifo_get_cfg_mac(net_config.mac);
>
> ... but actually above should be part of this function.

Will update it in v11

>
> --
> With Best Regards,
> Andy Shevchenko