Re: [bisected] Stack overflow after fs: "switch the IO-triggering parts of umount to fs_pin" (was net namespaces kernel stack overflow)

From: Alexander Aring
Date: Thu Apr 19 2018 - 14:37:44 EST


Hi,

On Thu, Apr 19, 2018 at 12:56 PM, Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:
> On 19.04.2018 19:44, Al Viro wrote:
>> On Thu, Apr 19, 2018 at 04:34:48PM +0100, Al Viro wrote:
>>
>>> IOW, we only get there if our vfsmount was an MNT_INTERNAL one.
>>> So we have mnt->mnt_umount of some MNT_INTERNAL mount found in
>>> ->mnt_pins of some other mount. Which, AFAICS, means that
>>> it used to be mounted on that other mount. How the hell can
>>> that happen?
>>>
>>> It looks like you somehow get a long chain of MNT_INTERNAL mounts
>>> stacked on top of each other, which ought to be prevented by
>>> mnt_flags &= ~MNT_INTERNAL_FLAGS;
>>> in do_add_mount(). Nuts...
>>
>> Arrrrrgh... Nuts is right - clone_mnt() preserves the sodding
>> MNT_INTERNAL, with obvious results.
>>
>> netns is related to the problem, by exposing MNT_INTERNAL mounts
>> (in /proc/*/ns/*) for mount --bind to copy and attach to the
>> tree. AFAICS, the minimal reproducer is
>>
>> touch /tmp/a
>> unshare -m sh -c 'for i in `seq 10000`; do mount --bind /proc/1/ns/net /tmp/a; done'
>>
>> (and it can be anything in /proc/*/ns/*, really)
>>
>> I think the fix should be along the lines of the following:
>>
>> Don't leak MNT_INTERNAL away from internal mounts
>>
>> We want it only for the stuff created by SB_KERNMOUNT mounts, *not* for
>> their copies.
>>
>> Cc: stable@xxxxxxxxxx
>> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>
> Flawless victory! Thanks.
>

Thanks to all.

Also thanks to Kirill for helping me here and doing the main part by
bisecting this issue.

Finally, my testing stuff which produced this bug also works well now.

Tested-by: Alexander Aring <aring@xxxxxxxxxxxx>

- Alex