Re: 5.14-rc failure to resume

From: Linus Torvalds
Date: Sat Jul 24 2021 - 16:48:36 EST


On Sat, Jul 24, 2021 at 12:44 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> This does appear to be the culprit. With it reverted on top of current
> master (and with the block and io_uring changes pulled in too), the
> kernel survives many resumes without issue.

That commit seems fundamentally buggy.

It makes "acpi_dev_get_next_match_dev()" always do

acpi_dev_put(adev);

to put the previous device, but "adev" is perfectly valid as NULL, and
acpi_dev_get_next_match_dev() even tests for it:

struct device *start = adev ? &adev->dev : NULL;

so it can - and will - do

acpi_dev_put(NULL);

which does

put_device(&adev->dev);

and passes in an invalid pointer to put_device().

And yes, that adev very much can be NULL, with drivers/acpi/utils.c
even passing it in explicitly:

struct acpi_device *
acpi_dev_get_first_match_dev(const char *hid, const char *uid, s64 hrv)
{
return acpi_dev_get_next_match_dev(NULL, hid, uid, hrv);
}
EXPORT_SYMBOL(acpi_dev_get_first_match_dev);

Am I missing something? How does that code work at all for anybody?

I probably _am_ missing something.

Linus