Re: [PATCH v2] nvme-tcp: Check if request has started before processing it

From: Sagi Grimberg
Date: Fri May 07 2021 - 16:26:20 EST



I would also mention, that a crash is not even the scariest issue that
we can see here, because if the request happened to be reused we are
in the silent data corruption realm...

If this does happen, I think we have to come up with some way to
mitigate it. We're not utilizing the full 16 bits of the command_id, so
maybe we can append something like a generation sequence number that can
be checked for validity.

That's actually a great idea. scsi needs unique tags so it encodes the
hwq in the upper 16 bits giving the actual tag the lower 16 bits which
is more than enough for a single queue. We can do the same with
a gencnt that will increment in both submission and completion and we
can validate against it.

This will be useful for all transports, so maintaining it in
nvme_req(rq)->genctr and introducing a helper like:
rq = nvme_find_tag(tagset, cqe->command_id)
That will filter genctr, locate the request.

Also:
nvme_validate_request_gen(rq, cqe->command_id) that would
compare against it.


And then a helper to set the command_id like:
cmd->common.command_id = nvme_request_command_id(rq)
that will both increment the genctr and build a command_id
from it.

Thoughts?


Well, that would require a modification to the CQE specification, no?
fmds was not amused when I proposed that :-(

Why would that require a modification to the CQE? it's just using say
4 msbits of the command_id to a running sequence...