Re: [REGRESSION] asix: Lots of asix_rx_fixup() errors and slow transmissions

From: John Stultz
Date: Tue May 03 2016 - 11:44:54 EST


On Tue, May 3, 2016 at 7:42 AM, David B. Robins <linux@xxxxxxxxxxxxxxx> wrote:
> On 2016-05-03 00:55, John Stultz wrote:
>>
>> Looking through the commits since the v4.1 kernel where we didn't see
>> this, I narrowed the regression down, and reverting the following two
>> commits seems to avoid the problem:
>>
>> 6a570814cd430fa5ef4f278e8046dcf12ee63f13 asix: Continue processing URB
>> if no RX netdev buffer
>> 3f30b158eba5c604b6e0870027eef5d19fc9271d asix: On RX avoid creating
>> bad Ethernet frames
>>
>
> I don't think the first one is giving you problems (except as triggered by
> the second) but I had concerns about the second myself (and emailed the
> author off-list, but received no reply), and we did not take that commit for
> our own product.

Yes, the first/later commit is being reverted as it modifies code that
was also changed by the second/earlier one. So the 3f30 patch doesn't
revert cleanly by itself, but I have tested by just removing the (now
modified by 6a57) chunk of code it adds does seem to avoid the problem
as well. Though I wasn't able to review things closely enough to be
confident that didn't introduce any subtle bugs in the remaining logic
in the 6a57 patch.

> Specifically, the second change, 3f30... (original patch:
> https://www.mail-archive.com/netdev@xxxxxxxxxxxxxxx/msg80720.html) (1)
> appears to do the exact opposite of what it claims, i.e., instead of "resync
> if this looks like a header", it does "resync if this does NOT look like a
> (packet) header", where "looks like a header" means "bits 0-10 (size) are
> equal to the bitwise-NOT of bits 16-26", and (2) can happen by coincidence
> for 1/2048 32-bit values starting a continuation URB (easy to hit dealing
> with large volumes of video data as we were). It appears to expect the
> header for every URB whereas the rest of the code at least expects it only
> once per network packet (look at following code that only reads it for
> remaining == 0).
>
> So that change made no sense to me, but I don't have significant kernel dev
> experience. Effectively it will drop/truncate every (2047/2048) split
> (longer than an URB) packet, and report an error for the second URB and then
> again for treating said second URB as a first URB for a packet. I would
> expect your problems will go away just removing the second change. You could
> also change the != to == in "if (size != ...)" but then you'd still have
> 1/2048 (depending on data patterns) false positives.

I'll have to look into this more. I'm not super familiar with usb or
networking, so I'm not sure I can judge the better approach.

thanks
-john