Re: [RFC PATCH 5/8] qaic: Implement data path

From: Arnd Bergmann
Date: Thu May 14 2020 - 18:21:16 EST


On Fri, May 15, 2020 at 12:06 AM Jeffrey Hugo <jhugo@xxxxxxxxxxxxxx> wrote:

> >> + u16 req_id;
> >> + u8 seq_id;
> >> + u8 cmd;
> >> + u32 resv;
> >> + u64 src_addr;
> >> + u64 dest_addr;
> >> + u32 len;
> >> + u32 resv2;
> >> + u64 db_addr; /* doorbell address */
> >> + u8 db_len; /* not a raw value, special encoding */
> >> + u8 resv3;
> >> + u16 resv4;
> >> + u32 db_data;
> >> + u32 sem_cmd0;
> >> + u32 sem_cmd1;
> >> + u32 sem_cmd2;
> >> + u32 sem_cmd3;
> >> +} __packed;
> >
> > All members are naturally aligned, so better drop the __packed
> > annotation get better code, unless the structure itself is
> > at an unaligned offset in memory.
>
> I'm going to have to disagree. While most "sane" compilers would not
> add extra padding, I've debugged enough issues in the past when
> sending/receiving data with foreign environments to never trust anything
> that isn't "packed".
>
> Unless I missed something in the C spec that requires naturally aligned
> structures to have an identical layout in memory, I'll take safety and
> functional correctness over performance.

The C standard does not mandate the layout and individual ABIs can
be different, but Linux does only runs on ABIs that have the correct
layout for the structure above

The problem is that the compiler will split up word accesses into bytes
on some architectures such as older ARM, to avoid unaligned
loads and stores. It should be fine if you add both __packed and
__aligned(8) to make the structure aligned again.

> >> + init_completion(&mem->xfer_done);
> >> + list_add_tail(&mem->list, &dbc->xfer_list);
> >> + tail = (tail + mem->nents) % dbc->nelem;
> >> + __raw_writel(cpu_to_le32(tail), dbc->dbc_base + REQTP_OFF);
> >
> > What is this __raw accessor for? This generally results in non-portable
> > code that should be replaced with writel() or something specific to
> > the bus on the architecture you deal with.
>
> The barrier(s) that comes with writel are unnecessary in this case.
> Since this is part of our critical path, we are sensitive to its
> performance.
>
> What are the portability issues around the __raw variant?

The access may not be atomic on all architectures but get either
split up or combined with adjacent accesses.

There is no guarantee about endianess: while most architectures
treat __raw access as a simple load/store, some might have an
implied byteswap.

__raw accesses might be completely reordered around other
mmio accesses or spinlocks.

If you want to avoid the barrier, there is the
writel_relaxed()/readl_relaxed() that skips most barriers,
so they can be reordered against MSI interrupts and
DMA accesses on architectures that allow (e.g. ARM
or PowerPC) but not against each other.

If you do use them, I'd still expect a comment that explains
why this instance is performance critical and how it is
synchronized against DMA or MSI if necessary.

Arnd