Re: [PATCH 0/2] Fix debugfs bind mount regression

From: Eric W. Biederman
Date: Wed Mar 09 2016 - 16:07:47 EST

Seth Forshee <seth.forshee@xxxxxxxxxxxxx> writes:

> Some full-OS container software bind mounts debugfs into containers to
> satisfy the assumptions of older userspaces which expect to be able to
> mount debugfs. This regressed in 4.1 due to the addition of tracefs,
> which gets automounted in the tracing subdirectory of debugfs. In a
> cloned mount namespace the bind mount now fails because the tracefs
> mount is a locked child of the debugfs mount.
> For new mounts we already make an exception to the "locked child mount"
> rule. Directories in psuedo filesystems created for the sole purpose of
> being mountpoints are created as permanently empty directories which can
> never contain any entries, therefore the kernel can know than any mounts
> on these directories are not for security purposes. These mounts are
> then excluded from locked mount tests in some circumstances.
> The same logic clearly applies to directories created in
> debugfs_create_automount(). The following patches update this function
> to create permanently empty directories for mountpoints and adds an
> exclusion to the tests for bind mounts to exclude child mounts on
> permanently empty directories.

So I don't know that this approach is bad. However in reading through
your patch descriptions I do not see any consideration of using
"mount --rbind" instead of "mount --bind". AKA adding the MS_REC flag
to your bind mount.

I would think simply using MS_REC would solve this problem, without
needing any additional kernel support. Am I missing something?