Re: [PATCH net] atl1c: fix error return code in atl1c_probe()

From: Chris Snook
Date: Tue Nov 17 2020 - 04:22:44 EST


On Tue, Nov 17, 2020 at 1:01 AM Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:
>
> Am 17.11.2020 um 08:43 schrieb Chris Snook:
> > The full text of the preceding comment explains the need:
> >
> > /*
> > * The atl1c chip can DMA to 64-bit addresses, but it uses a single
> > * shared register for the high 32 bits, so only a single, aligned,
> > * 4 GB physical address range can be used at a time.
> > *
> > * Supporting 64-bit DMA on this hardware is more trouble than it's
> > * worth. It is far easier to limit to 32-bit DMA than update
> > * various kernel subsystems to support the mechanics required by a
> > * fixed-high-32-bit system.
> > */
> >
> > Without this, we get data corruption and crashes on machines with 4 GB
> > of RAM or more.
> >
> > - Chris
> >
> > On Mon, Nov 16, 2020 at 11:14 PM Heiner Kallweit <hkallweit1@xxxxxxxxx> wrote:
> >>
> >> Am 17.11.2020 um 03:55 schrieb Zhang Changzhong:
> >>> Fix to return a negative error code from the error handling
> >>> case instead of 0, as done elsewhere in this function.
> >>>
> >>> Fixes: 85eb5bc33717 ("net: atheros: switch from 'pci_' to 'dma_' API")
> >>> Reported-by: Hulk Robot <hulkci@xxxxxxxxxx>
> >>> Signed-off-by: Zhang Changzhong <zhangchangzhong@xxxxxxxxxx>
> >>> ---
> >>> drivers/net/ethernet/atheros/atl1c/atl1c_main.c | 4 ++--
> >>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> >>> index 0c12cf7..3f65f2b 100644
> >>> --- a/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> >>> +++ b/drivers/net/ethernet/atheros/atl1c/atl1c_main.c
> >>> @@ -2543,8 +2543,8 @@ static int atl1c_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> >>> * various kernel subsystems to support the mechanics required by a
> >>> * fixed-high-32-bit system.
> >>> */
> >>> - if ((dma_set_mask(&pdev->dev, DMA_BIT_MASK(32)) != 0) ||
> >>> - (dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(32)) != 0)) {
> >>> + err = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(32));
> >>
> >> I wonder whether you need this call at all, because 32bit is the default.
> >> See following
> >>
> >> "By default, the kernel assumes that your device can address 32-bits
> >> of DMA addressing."
> >>
> >> in https://www.kernel.org/doc/Documentation/DMA-API-HOWTO.txt
> >>
> >>> + if (err) {
> >>> dev_err(&pdev->dev, "No usable DMA configuration,aborting\n");
> >>> goto err_dma;
> >>> }
> >>>
> >>
>
> Please don't top-post.
> >From what I've seen the kernel configures 32bit as default DMA size.
> See beginning of pci_device_add(), there the coherent mask is set to 32bit.
>
> And in pci_setup_device() see the following:
> /*
> * Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer)
> * set this higher, assuming the system even supports it.
> */
> dev->dma_mask = 0xffffffff;
>
>
> That means if you would like to use 64bit DMA then you'd need to configure this explicitly.
> You could check to which mask dev->dma_mask and dev->coherent_dma_mask are set
> w/o the call to dma_set_mask_and_coherent.

I don't remember the exact history with atl1c, but we really did hit
this bug with atl1 and atl2. I'm not sure if that's because this
default wasn't there or if it's because because another call was
replaced with this call, but either way it's quite likely that at some
point in the future someone who doesn't even have test hardware will
try to port this to a newer interface that doesn't make the same
assumption, and bad things will happen. This isn't a hot path, so it's
better to be explicit. If dma_set_mask_and_coherent() ever takes a
long time or fails, something is seriously wrong and we probably want
to know about it before we start DMAing.

- Chris