Re: Regression: memory corruption on Atmel SAMA5D31

From: Peter Rosin
Date: Thu Mar 10 2022 - 04:59:08 EST


[bringing this threadlet back to the lists, hope that's ok]

On 2022-03-10 09:27, Nicolas Ferre wrote:
> On 09/03/2022 at 12:42, Peter Rosin wrote:
>> On 2022-03-09 11:38, Nicolas Ferre wrote:
>>> Hi Peter,
>>>

*snip*

>>> One of my colleagues had an idea about this issue and in particular with
>>> the fact that removing some of the entries in the structure triggered
>>> the problem: "isn't it some kind of misalignment between structures that
>>> are supposed to be treated in 64 bits machines and our 32 bits core that
>>> we use?"
>>> This misalignment or "wrong assumption" of using 64 bits machine might
>>> be present in the USB stack as it seems to be related to this sub-system
>>> somehow.
>>
>> Yes, something like that has been creeping around in the back of my
>> head too. And it could be something much later in struct device that
>> is no longer sufficiently aligned when struct dev_links_info changes.
>> But what?

I verified the alignment of various things. With the old working
struct dev_links_info, i.e.

struct dev_links_info {
struct list_head suppliers;
struct list_head consumers;
struct list_head needs_suppliers;
struct list_head defer_sync;
bool need_for_probe;
enum dl_dev_state status;
};

I get

sizeof(struct device) 440
sizeof(struct dev_links_info) 40
offsetof(struct device, links) 80
offsetof(struct device, power) 120

"power" is the next member after "struct dev_links_info links" in
struct device, and I find no other uses of struct dev_links_info.
With the new problematic layout, i.e.

struct dev_links_info {
struct list_head suppliers;
struct list_head consumers;
struct list_head defer_sync;
enum dl_dev_state status;
};

I get:

sizeof(struct device) 432
sizeof(struct dev_links_info) 28
offsetof(struct device, links) 80
offsetof(struct device, power) 112

Which means that everything around and within dev_links_info is 8-byte
aligned in the same way in either case. The exception being that
"status" no longer shares 8-byte space with "need_for_probe" (which is
gone). But that should only make things better, no?

That combined with the test with this permuted version (swapped two
list_heads in the middle):

struct dev_links_info {
struct list_head suppliers;
struct list_head consumers;
struct list_head defer_sync;
struct list_head needs_suppliers;
bool need_for_probe;
enum dl_dev_state status;
};

which displayed a new failure mode (BUG instead of corruption, see
upthread) indicates that this is not an alignment issue. Famous last
words...

> From that article:
> https://lwn.net/Articles/885941/
>
> I read:

> "Koschel included a patch fixing a bug in the USB subsystem where the
> iterator passed to this macro was used after the exit from the macro,
> which is a dangerous thing to do. Depending on what happens within the
> list, the contents of that iterator could be something surprising, even
> in the absence of speculative execution. Koschel fixed the problem by
> reworking the code in question to stop using the iterator after the loop. "
>
> USB subsystem, "struct list_head *next, *prev;"... Some keywords present
> there... worth a try?
>
> Regards,
> Nicolas

gr_udc.c is not built with the config that is in use, which is sad because
it looked like a good candidate.

Cheers,
Peter