Re: [PATCH 0/2] Fix use-after-free bug when ports are removed

From: Sagi Grimberg
Date: Wed Jul 03 2019 - 15:20:09 EST



Can we handle this in the core instead (also so we'd be consistent
across transports)?

Yes, that would be much better if we can sort out some other issues below...

How about this untested patch instead?

I've found a couple of problems with the patch:

1) port->subsystems will always be empty before nvmet_disable_port() is
called. We'd have to restructure things a little perhaps so that when a
subsystem is removed from a port, all the active controllers for that
subsys/port combo would be removed.

Yes, that is better.

2) loop needs to call flush_workqueue(nvme_delete_wq) somewhere,
otherwise there's a small window after the port disappears while
commands can still be submitted. We can actually still hit the bug with
a tight loop.

We could simply flush the workqueue in nvme_loop_delete_ctrl for
each controller?

Might be an overkill though, and its risking circular locking in case
we are going via the fatal error path (work context flushing a different workqueue always annoys lockdep even when its perfectly safe)

Maybe there's other cleanup that could be done to solve this: it does
seem like all three transports need to keep their own lists of open
controllers and loops through them to delete them. In theory, that could
be made common so there's common code to manage the list per transport
which would remove some boiler plate code. If we want to go this route
though, I'd suggest using my patches as is for now and doing the cleanup
in the next cycle.

Then please fix tcp as well.