Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

From: Sagi Grimberg
Date: Wed Mar 31 2021 - 18:38:17 EST

Next message: Thomas Gleixner: "Re: [PATCH v8 3/6] stack: Optionally randomize kernel stack offset each syscall"
Previous message: Dave Hansen: "Re: [PATCH v4 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD"
In reply to: Sagi Grimberg: "Re: [PATCH v2] nvme-tcp: Check if request has started before processing it"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

It is, but in this situation, the controller is sending a second
completion that results in a use-after-free, which makes the
transport irrelevant. Unless there is some other flow (which is
unclear
to me) that causes this which is a bug that needs to be fixed rather
than hidden with a safeguard.

The kernel should not crash regardless of any network traffic that is
sent to the system. It should not be possible to either intentionally
of mistakenly contruct packets that will deny service in this way.

This is not specific to nvme-tcp. I can build an rdma or pci controller
that can trigger the same crash... I saw a similar patch from Hannes
implemented in the scsi level, and not the individual scsi transports..

If scsi wants this too, this could be made generic at the blk-mq level.
We just need to make something like blk_mq_tag_to_rq(), but return NULL
if the request isn't started.

Makes sense...

I would also mention, that a crash is not even the scariest issue that
we can see here, because if the request happened to be reused we are
in the silent data corruption realm...

If this does happen, I think we have to come up with some way to
mitigate it. We're not utilizing the full 16 bits of the command_id, so
maybe we can append something like a generation sequence number that can
be checked for validity.

That's actually a great idea. scsi needs unique tags so it encodes the
hwq in the upper 16 bits giving the actual tag the lower 16 bits which
is more than enough for a single queue. We can do the same with
a gencnt that will increment in both submission and completion and we
can validate against it.

This will be useful for all transports, so maintaining it in
nvme_req(rq)->genctr and introducing a helper like:
rq = nvme_find_tag(tagset, cqe->command_id)
That will filter genctr, locate the request.

Also:
nvme_validate_request_gen(rq, cqe->command_id) that would
compare against it.

And then a helper to set the command_id like:
cmd->common.command_id = nvme_request_command_id(rq)
that will both increment the genctr and build a command_id
from it.

Thoughts?

Next message: Thomas Gleixner: "Re: [PATCH v8 3/6] stack: Optionally randomize kernel stack offset each syscall"
Previous message: Dave Hansen: "Re: [PATCH v4 1/1] x86/tdx: Handle MWAIT, MONITOR and WBINVD"
In reply to: Sagi Grimberg: "Re: [PATCH v2] nvme-tcp: Check if request has started before processing it"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]