Re: [PATCH] USB:bugfix a controller halt error
From: liulongfang
Date: Wed Aug 09 2023 - 21:20:42 EST
On 2023/7/27 23:57, Alan Stern wrote:
> On Thu, Jul 27, 2023 at 05:31:41PM +0200, Oliver Neukum wrote:
>> On 27.07.23 16:42, Alan Stern wrote:
>>> On Thu, Jul 27, 2023 at 03:03:57PM +0800, liulongfang wrote:
>>>> On 2023/7/26 22:20, Alan Stern wrote:
>>
>>>>> It seems to me that something along these lines must be necessary in
>>>>> any case. Unless the bad memory is cleared somehow, it would never be
>>>>> usable again. The kernel might deallocate it, then reallocate for
>>>>> another purpose, and then crash when the new user tries to access it.
>>>>>
>>>>> In fact, this scenario could still happen even with your patch, which
>>>>> means the patch doesn't really fix the problem.
>>
>> I suppose in theory you could have something like a bad blocks list
>> just for RAM, but that would really hurt. You'd have to do something
>> about every DMA operation in every driver in theory.
>>
>> Error handling would basically be an intentional memory leak.
>
> I started out thinking this way, but maybe that's not how it works.
> Perhaps simply overwriting the part of memory that got the ECC error
> would clear the error state. (This may depend on the kind of error,
> one-time vs. permanent.)
>
> If that's the case, and if the memory buffer was deallocated without
> being accessed and then later reallocated, things would be okay. The
> routine that reallocated the buffer wouldn't try to read from it before
> initializing it somehow.
>
>>>> This patch is only used to prevent data in the buffer from being accessed.
>>>> As long as the data is not accessed, the kernel does not crash.
>>>
>>> I still don't understand. You haven't provided nearly enough
>>> information. You should start by answering the questions that Oliver
>>> asked. Then answer this question:
>>>
>>> The code you are concerned about is this:
>>>
>>> r = usb_control_msg(udev, usb_rcvaddr0pipe(),
>>> USB_REQ_GET_DESCRIPTOR, USB_DIR_IN,
>>> USB_DT_DEVICE << 8, 0,
>>> buf, GET_DESCRIPTOR_BUFSIZE,
>>> initial_descriptor_timeout);
>>> switch (buf->bMaxPacketSize0) {
>>>
>>> You're worried that if an ECC memory error occurs during the
>>> usb_control_msg transfer, the kernel will crash when the "switch"
>>> statement tries to read the value of buf->bMaxPacketSize0. That's a
>>> reasonable thing to worry about.
>>
>> Albeit unlikely. If the hardware and implementation are reasonable
>> you'd return a specific error code from the HCD and clean up the
>> RAM in your ecc driver.
>>
>> The fix for USB would then conceptually be something like
>>
>> retryio:
>> r = usb_control_msg()
>> if (r == -EMEMORYCORRUPTION)
>> goto retryio;
>
> Yes, we could do this, but it's not necessary. Let's say that the HCD
> returns -EMEMORYCORRUPTION and the ecc driver cleans up the RAM
> (probably by resetting its contents to 0, but possibly leaving garbage
> there instead). Then when the following code in hub_port_init() tests
> buf->bMaxPacketSize0, it will see an invalid value and will retry the
> transfer.
>
> Or, with low probability, it will see a valid but incorrect value. If
> that happens then later transfers using ep0 will fail, causing the hub
> driver to reiterate the outer loop in hub_port_connect(). Eventually
> the device will be correctly initialized and enumerated.
>
> Alan Stern
>
OK, thanks.
Longfang.
> .
>