Re: usb: dwc2: regression on MyBook Live Duo / Canyonlands since 4.3.0-rc4

From: Christian Lamparter
Date: Wed May 18 2016 - 15:15:08 EST


On Tuesday, May 17, 2016 04:50:48 PM John Youn wrote:
> On 5/14/2016 6:11 AM, Christian Lamparter wrote:
> > On Thursday, May 12, 2016 11:40:28 AM John Youn wrote:
> >> On 5/12/2016 6:30 AM, Christian Lamparter wrote:
> >>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote:
> >>>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote:
> >>>>>>>> Detecting the endianess of the
> >>>>>>>> device is probably the best future-proof solution, but it's also
> >>>>>>>> considerably more work to do in the driver, and comes with a
> >>>>>>>> tiny runtime overhead.
> >>>>>>>
> >>>>>>> The runtime overhead is probably non-measurable compared with the cost
> >>>>>>> of the actual MMIOs.
> >>>>>>
> >>>>>> Right. The code size increase is probably measurable (but still small),
> >>>>>> the runtime overhead is not.
> >>>>>
> >>>>> Ok, so no rebuts or complains have been posted.
> >>>>>
> >>>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354
> >>>>> and it works:
> >>>>>
> >>>>> Tested-by: Christian Lamparter <chunkeey@xxxxxxxxxxxxxx>
> >>>>>
> >>>>> So, how do we go from here? There is are two small issues with the
> >>>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case:
> >>>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice.
> >>>>>
> >>>>> Arnd, can you please respin and post it (cc'd stable as well)?
> >>>>> So this is can be picked up? Or what's your plan?
> >>>>
> >>>> (I just realized my reply was stuck in my outbox, so the patch
> >>>> went out first)
> >>>>
> >>>> If I recall correctly, the rough consensus was to go with your longer
> >>>> patch in the future (fixed up for the comments that BenH and
> >>>> I sent), and I'd suggest basing it on top of a fixed version of
> >>>> my patch.
> >>> Well, but it comes with the "overhead"! So this was just as I said:
> >>> "Let's look at it and see if it's any good"... And I think it isn't
> >>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN
> >>> archs etc...
> >>
> >> I slightly prefer the more general patch for future kernel versions.
> >> The overhead will probably be negligible, but we can perform some
> >> testing to make sure.
> >>
> >> Can you resubmit with all gathered feedback?
> >
> > Yes, here are the changes.
> >
> > I've tested it on my MyBook Live Duo. The usbotg comes right up:
> > [12610.540004] dwc2 4bff80000.usbotg: USB bus 1 deregistered
> > [12612.513934] dwc2 4bff80000.usbotg: Specified GNPTXFDEP=1024 > 256
> > [12612.518756] dwc2 4bff80000.usbotg: EPs: 3, shared fifos, 2042 entries in SPRAM
> > [12612.530112] dwc2 4bff80000.usbotg: DWC OTG Controller
> > [12612.533948] dwc2 4bff80000.usbotg: new USB bus registered, assigned bus number 1
> > [12612.540083] dwc2 4bff80000.usbotg: irq 33, io mem 0x00000000
> >
> > John: Can you run some perf test with it?
> >
> > I've based this on:
> >
> > commit 6ea2fffc9057a67df1994d85a7c085d899eaa25a
> > Author: Arnd Bergmann <arnd@xxxxxxxx>
> > Date: Fri May 13 15:52:27 2016 +0200
> >
> > usb: dwc2: fix regression on big-endian PowerPC/ARM systems
> >
> > so naturally, it needs to be applied first.
> > Most of the conversion work was done by the attached
> > coccinelle semantic patches.
> >
> > I had to edit the __bic32 and __orr32 helpers by hand.
> > As well as some debugfs code and stuff in gadget.c.
> >
>
> Thanks Christian.
>
> I'll keep this in our internal tree and send it to Felipe later. This
> causes a bunch of conflicts that I have to fix up and I should do a
> bit of testing as well.
>
> And since there is a patch that fixes the regression this is can wait.
>
> Regards,
> John
---
Hey, that's really nice of you to do that :-D. Please keep me in the
loop (Cc) for those then.

Yes, this needs definitely testing on all the affected ARCHs.
I've attached a diff to a updated version of the patch. It
drops the special MIPS case (as requested by Arnd).

BTW, I looked into the ioread32_rep and iowrite32_rep again. I'm
not entirely convinced that the hardware FIFOs are actually endian
neutral. But I can't verify it since my Western Digital My Book Live
only supports the host configuration (forces host mode), so I don't
know what a device in dual-mode or peripheral do here.

The reason why I think it was broken is because there's a PIO copy
to and from the HCFIFO(x) in dwc2_hc_write_packet and
dwc2_hc_read_packet access in the hcd.c file as well... And there,
the code was using the dwc2_readl and dwc2_writel to access the data.
I added special accessors for the FIFOS now:
dwc2_readl_rep and dwc2_writel_rep.

I went all the way and implemented the helpers to do unaligned access
if necessary (not sure if adding likely branches is a good idea, as
this could be either always true or false for a specific driver the
whole time).

NB: it also fixes a "regs variable not used in dwc2_hsotg_dump" warning
if DEBUG isn't selected.

NB2: If it you need a patch against a specific tree, please
let me know.
---
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 2fa57cd..69030bb 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -42,6 +42,7 @@
#include <linux/usb/gadget.h>
#include <linux/usb/otg.h>
#include <linux/usb/phy.h>
+#include <asm/unaligned.h>
#include "hw.h"

/*
@@ -958,50 +959,6 @@ enum dwc2_halt_status {
DWC2_HC_XFER_URB_DEQUEUE,
};

-#ifdef CONFIG_MIPS
-/*
- * There are some MIPS machines that can run in either big-endian
- * or little-endian mode and that use the dwc2 register without
- * a byteswap in both ways.
- * Unlike other architectures, MIPS apparently does not require a
- * barrier before the __raw_writel() to synchronize with DMA but does
- * require the barrier after the __raw_writel() to serialize a set of
- * writes. This set of operations was added specifically for MIPS and
- * should only be used there.
- */
-static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
- ptrdiff_t reg)
-{
- const void __iomem *addr = hsotg->regs + reg;
- u32 value = __raw_readl(addr);
-
- /*
- * In order to preserve endianness __raw_* operation is used. Therefore
- * a barrier is needed to ensure IO access is not re-ordered across
- * reads or writes
- */
- mb();
- return value;
-}
-
-static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
- ptrdiff_t reg)
-{
- const void __iomem *addr = hsotg->regs + reg;
- __raw_writel(value, addr);
-
- /*
- * In order to preserve endianness __raw_* operation is used. Therefore
- * a barrier is needed to ensure IO access is not re-ordered across
- * reads or writes
- */
- mb();
-#ifdef DWC2_LOG_WRITES
- pr_info("INFO:: wrote %08x to %p\n", value, addr);
-#endif
-}
-#else
-/* Normal architectures just use readl/write_be */
static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
ptrdiff_t reg)
{
@@ -1014,7 +971,8 @@ static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg,
}


-static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
+static inline void dwc2_writel(struct dwc2_hsotg *hsotg,
+ const u32 value,
ptrdiff_t reg)
{
void __iomem *addr = hsotg->regs + reg;
@@ -1028,7 +986,103 @@ static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value,
pr_info("info:: wrote %08x to %p\n", value, addr);
#endif
}
-#endif
+
+static inline void dwc2_readl_rep(struct dwc2_hsotg *hsotg,
+ ptrdiff_t reg,
+ u32 *buf, const size_t len)
+{
+ void __iomem *addr = hsotg->regs + reg;
+ size_t i, remaining = len & ~4;
+
+ if (hsotg->is_big_endian) {
+ if (likely(IS_ALIGNED(*buf, 0x4))) {
+ for (i = len >> 2; i > 0; i--)
+ *buf++ = ioread32be(addr);
+ } else {
+ /* xfer_buf is not DWORD aligned */
+ for (i = len >> 2; i > 0; i--) {
+ u32 data = ioread32be(addr);
+
+ put_unaligned(data, buf);
+ buf++;
+ }
+ }
+ } else {
+ /* little-endian accessors */
+ if (likely(IS_ALIGNED(*buf, 0x4))) {
+ for (i = len >> 2; i > 0; i--)
+ *buf++ = ioread32(addr);
+
+ } else {
+ /* xfer_buf is not DWORD aligned */
+ for (i = len >> 2; i > 0; i--) {
+ u32 data = ioread32be(addr);
+
+ put_unaligned(data, buf);
+ buf++;
+ }
+ }
+ }
+
+ if (unlikely(remaining)) {
+ u32 data_u32;
+ u8 *buf_u8 = (u8 *) buf;
+ u8 *data_u8 = (u8 *) &data_u32;
+
+ data_u32 = dwc2_readl(hsotg, reg);
+
+ while (remaining--)
+ *buf_u8++ = *data_u8++;
+ }
+}
+
+static inline void dwc2_writel_rep(struct dwc2_hsotg *hsotg,
+ const ptrdiff_t reg,
+ const u32 *buf, const size_t len)
+{
+ void __iomem *addr = hsotg->regs + reg;
+ size_t i, remaining = len & ~4;
+
+ if (hsotg->is_big_endian) {
+ if (likely(IS_ALIGNED(*buf, 0x4))) {
+ for (i = len >> 2; i > 0; i--)
+ iowrite32be(*buf++, addr);
+ } else {
+ /* xfer_buf is not DWORD aligned */
+ for (i = len >> 2; i > 0; i--) {
+ u32 data = get_unaligned(buf);
+
+ iowrite32be(data, addr);
+ buf++;
+ }
+ }
+ } else {
+ /* little-endian accessors */
+ if (likely(IS_ALIGNED(*buf, 0x4))) {
+ for (i = len >> 2; i > 0; i--)
+ iowrite32(*buf++, addr);
+ } else {
+ /* xfer_buf is not DWORD aligned */
+ for (i = len >> 2; i > 0; i--) {
+ u32 data = get_unaligned(buf);
+
+ iowrite32(data, addr);
+ buf++;
+ }
+ }
+ }
+
+ if (unlikely(remaining)) {
+ u32 data_u32;
+ u8 *buf_u8 = (u8 *) buf;
+ u8 *data_u8 = (u8 *) &data_u32;
+
+ while (remaining--)
+ *data_u8++ = *buf_u8++;
+
+ dwc2_writel(hsotg, data_u32, reg);
+ }
+}

extern int dwc2_detect_endiannes(struct dwc2_hsotg *hsotg);

diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c
index 2c687d9..531b30f 100644
--- a/drivers/usb/dwc2/gadget.c
+++ b/drivers/usb/dwc2/gadget.c
@@ -317,7 +317,7 @@ static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg,
u32 gnptxsts = dwc2_readl(hsotg, GNPTXSTS);
int buf_pos = hs_req->req.actual;
int to_write = hs_ep->size_loaded;
- void *data;
+ u32 *data;
int can_write;
int pkt_round;
int max_transfer;
@@ -457,10 +457,9 @@ static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg,
if (periodic)
hs_ep->fifo_load += to_write;

- to_write = DIV_ROUND_UP(to_write, 4);
data = hs_req->req.buf + buf_pos;

- iowrite32_rep(hsotg->regs + EPFIFO(hs_ep->index), data, to_write);
+ dwc2_writel_rep(hsotg, EPFIFO(hs_ep->index), data, to_write);

return (to_write >= can_write) ? -ENOSPC : 0;
}
@@ -1439,12 +1438,11 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)
{
struct dwc2_hsotg_ep *hs_ep = hsotg->eps_out[ep_idx];
struct dwc2_hsotg_req *hs_req = hs_ep->req;
- void __iomem *fifo = hsotg->regs + EPFIFO(ep_idx);
+ u32 *data;
int to_read;
int max_req;
int read_ptr;

-
if (!hs_req) {
u32 epctl = dwc2_readl(hsotg, DOEPCTL(ep_idx));
int ptr;
@@ -1455,7 +1453,7 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)

/* dump the data from the FIFO, we've nothing we can do */
for (ptr = 0; ptr < size; ptr += 4)
- (void)dwc2_readl(hsotg, EPFIFO(ep_idx));
+ (void)__raw_readl(hsotg->regs + EPFIFO(ep_idx));

return;
}
@@ -1479,13 +1477,14 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size)

hs_ep->total_data += to_read;
hs_req->req.actual += to_read;
- to_read = DIV_ROUND_UP(to_read, 4);

/*
* note, we might over-write the buffer end by 3 bytes depending on
* alignment of the data.
*/
- ioread32_rep(fifo, hs_req->req.buf + read_ptr, to_read);
+ data = hs_req->req.buf + read_ptr;
+
+ dwc2_readl_rep(hsotg, EPFIFO(ep_idx), data, to_read);
}

/**
@@ -3411,7 +3416,6 @@ static void dwc2_hsotg_dump(struct dwc2_hsotg *hsotg)
{
#ifdef DEBUG
struct device *dev = hsotg->dev;
- void __iomem *regs = hsotg->regs;
u32 val;
int idx;

diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index dcd6338..8568ff4 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -567,19 +567,10 @@ u32 dwc2_calc_frame_interval(struct dwc2_hsotg *hsotg)
void dwc2_read_packet(struct dwc2_hsotg *hsotg, u8 *dest, u16 bytes)
{
u32 *data_buf = (u32 *)dest;
- int word_count = (bytes + 3) / 4;
- int i;
-
- /*
- * Todo: Account for the case where dest is not dword aligned. This
- * requires reading data from the FIFO into a u32 temp buffer, then
- * moving it into the data buffer.
- */

dev_vdbg(hsotg->dev, "%s(%p,%p,%d)\n", __func__, hsotg, dest, bytes);

- for (i = 0; i < word_count; i++, data_buf++)
- *data_buf = dwc2_readl(hsotg, HCFIFO(0));
+ dwc2_readl_rep(hsotg, HCFIFO(0), data_buf, bytes);
}

/**
@@ -1236,10 +1227,8 @@ static void dwc2_set_pid_isoc(struct dwc2_host_chan *chan)
static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg,
struct dwc2_host_chan *chan)
{
- u32 i;
u32 remaining_count;
u32 byte_count;
- u32 dword_count;
u32 *data_buf = (u32 *)chan->xfer_buf;

if (dbg_hc(chan))
@@ -1251,20 +1240,7 @@ static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg,
else
byte_count = remaining_count;

- dword_count = (byte_count + 3) / 4;
-
- if (((unsigned long)data_buf & 0x3) == 0) {
- /* xfer_buf is DWORD aligned */
- for (i = 0; i < dword_count; i++, data_buf++)
- dwc2_writel(hsotg, *data_buf, HCFIFO(chan->hc_num));
- } else {
- /* xfer_buf is not DWORD aligned */
- for (i = 0; i < dword_count; i++, data_buf++) {
- u32 data = data_buf[0] | data_buf[1] << 8 |
- data_buf[2] << 16 | data_buf[3] << 24;
- dwc2_writel(hsotg, data, HCFIFO(chan->hc_num));
- }
- }
+ dwc2_writel_rep(hsotg, HCFIFO(chan->hc_num), data_buf, byte_count);

chan->xfer_count += byte_count;
chan->xfer_buf += byte_count;