Re: DM's filesystem lookup in dm_get_dev_t() [was: Re: linux-next: manual merge of the device-mapper tree with Linus' tree]
From: Hannes Reinecke
Date: Tue Dec 22 2020 - 12:25:10 EST
On 12/22/20 3:53 PM, Mike Snitzer wrote:
[added linux-block and dm-devel, if someone replies to this email to
continue "proper discussion" _please_ at least drop sfr and linux-next
from Cc]
On Tue, Dec 22 2020 at 8:15am -0500,
Christoph Hellwig <hch@xxxxxx> wrote:
Mike, Hannes,
I think this patch is rather harmful. Why does device mapper even
mix file system path with a dev_t and all the other weird forms
parsed by name_to_dev_t, which was supposed to be be for the early
init code where no file system is available.
OK, I'll need to revisit (unless someone beats me to it) because this
could've easily been a blind-spot for me when the dm-init code went in.
Any dm-init specific enabling interface shouldn't be used by more
traditional DM interfaces. So Hannes' change might be treating symptom
rather than the core problem (which would be better treated by factoring
out dm-init requirements for a name_to_dev_t()-like interface?).
DM has supported passing maj:min and blockdev names on DM table lines
forever... so we'll need to be very specific about where/why things
regressed.
Ok. The problem from my perspective is that device-mapper needs to
a) ensure that the arbitrary string passed in with the table definition
refers to a valid block device
and
b) the block device can be opened with O_EXCL, so that device-mapper can
then use it.
Originally (ie prior to commit 644bda6f3460) dm_get_device() just
converted the string to a 'dev_t' representation, and then the block
device itself was checked and opened in dm_get_table_device().
'lookup_bdev' was just being used to convert the path if the string was
not in the canonical major:minor format, as then it was assumed that it
referred to a block device node, and then lookup_bdev kinda makes sense.
However, lookup_bdev() now always recurses into the filesystem, causing
multipath to stall in an all-paths-down scenario.
So, the real issue is the table definiton; as it also accepts a device
to be specified by the block device _node_ name, we need to have a way
of converting that into a dev_t.
If lookup_bdev() is the wrong interface for that, by all means, please,
do tell me. I'd be happy to draft up a patch.
Alternatively, if Mike says that only major:minor is the valid format
for a table definition we can kill that code completely. But clearly _I_
cannot make the call here.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer