Re: [RFC PATCH v3 1/3] blk-mq: Clean up references to old requests when freeing rqs

From: John Garry
Date: Mon Mar 08 2021 - 06:21:54 EST


On 06/03/2021 02:52, Khazhy Kumykov wrote:
On Fri, Mar 5, 2021 at 7:20 AM John Garry <john.garry@xxxxxxxxxx> wrote:

It has been reported many times that a use-after-free can be intermittently
found when iterating busy requests:

- https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96fa96@xxxxxxxxxx/
- https://lore.kernel.org/linux-block/5c3ac5af-ed81-11e4-fee3-f92175f14daf@xxxxxxx/T/#m6c1ac11540522716f645d004e2a5a13c9f218908
- https://lore.kernel.org/linux-block/04e2f9e8-79fa-f1cb-ab23-4a15bf3f64cc@xxxxxxxxx/

The issue is that when we switch scheduler or change queue depth, there may
be references in the driver tagset to the stale requests.

As a solution, clean up any references to those requests in the driver
tagset. This is done with a cmpxchg to make safe any race with setting the
driver tagset request from another queue.

I noticed this crash recently when running blktests on a "debug"
config on a 4.15 based kernel (it would always crash), and backporting
this change fixes it. (testing on linus's latest tree also confirmed
the fix, with the same config). I realize I'm late to the
conversation, but appreciate the investigation and fixes :)

Good to know. I'll explicitly cc you on further versions.

Thanks,
John