Re: [PATCH] nvme-tcp: Check if request has started before processing it

From: Sagi Grimberg
Date: Mon Feb 15 2021 - 16:30:47 EST



blk_mq_tag_to_rq() will always return a request if the command_id is
in the valid range. Check if the request has been started. If we
blindly process the request we might double complete a request which
can be fatal.

How did you get to this one? did the controller send a completion for
a completed/bogus request?

If that is the case, then that must mean it's possible the driver could
have started the command id just before the bogus completion check. Data
iorruption, right?

'during TCP LIF toggles and aggr relocates' testing the host
crashes. TBH, I do not really know what is happening or what the test
does. Still trying to figure out what's going on.

Well, I think we should probably figure out why that is happening first.

I was just very
surprised how much the code trusts the other side to behave correctly./

What does pci/rdma/fc does differently? What does scsi do here?

Yes, which is why I don't think this check is very useful..

I actually view that as a valid protection against spoofed frames.
Without it it's easy to crash the machine by injecting fake completions with
random command ids.

In this test scenario it's not even a spoofed frame; maybe just a
confused controller.

Maybe... I am still not sure how this patch helps here though...