Re: [RFC][PATCH] mount: In mark_umount_candidates and __propogate_umount visit each mount once

From: Andrey Vagin
Date: Thu Oct 13 2016 - 22:31:34 EST


On Thu, Oct 13, 2016 at 2:46 PM, Andrei Vagin <avagin@xxxxxxxxxxxxx> wrote:
> On Thu, Oct 13, 2016 at 02:53:46PM -0500, Eric W. Biederman wrote:
>>
>> Adrei Vagin pointed out that time to executue propagate_umount can go
>> non-linear (and take a ludicrious amount of time) when the mount
>> propogation trees of the mounts to be unmunted by a lazy unmount
>> overlap.
>>
>> Solve this in the most straight forward way possible, by adding a new
>> mount flag to mark parts of the mount propagation tree that have been
>> visited, and use that mark to skip parts of the mount propagation tree
>> that have already been visited during an unmount. This guarantees
>> that each mountpoint in the possibly overlapping mount propagation
>> trees will be visited exactly once.
>>
>> Add the functions propagation_visit_next and propagation_revisit_next
>> to coordinate setting and clearling the visited mount mark.
>>
>> Here is a script to generate such mount tree:
>> $ cat run.sh
>> mount -t tmpfs test-mount /mnt
>> mount --make-shared /mnt
>> for i in `seq $1`; do
>> mkdir /mnt/test.$i
>> mount --bind /mnt /mnt/test.$i
>> done
>> cat /proc/mounts | grep test-mount | wc -l
>> time umount -l /mnt
>> $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done
>>
>> Here are the performance numbers with and without the patch:
>>
>> mounts | before | after (real sec)
>> -----------------------------
>> 1024 | 0.071 | 0.024
>> 2048 | 0.184 | 0.030
>> 4096 | 0.604 | 0.040
>> 8912 | 4.471 | 0.043
>> 16384 | 34.826 | 0.082
>> 32768 | | 0.151
>> 65536 | | 0.289
>> 131072 | | 0.659
>>
>> Andrei Vagin fixing this performance problem is part of the
>> work to fix CVE-2016-6213.
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Reported-by: Andrei Vagin <avagin@xxxxxxxxxx>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> ---
>>
>> Andrei can you take a look at this patch and see if you can see any
>> problems. My limited testing suggests this approach does a much better
>> job of solving the problem you were seeing. With the time looking
>> almost linear in the number of mounts now.
>
> I read this patch and I like the idea.
>
> Then I run my tests and one of them doesn't work with this patch.
> I haven't found a reason yet.

>> + for (m = propagation_visit_next(parent, parent); m;
>> + m = propagation_visit_next(m, parent)) {
>> struct mount *child = __lookup_mnt_last(&m->mnt,
>> mnt->mnt_mountpoint);

The reason is that this loop is called for different "mnt", but
it is executed only once with this optimization.

So I think the idea to mark parent will not work, because one parent
can have a few children which have to be umounted.

>
> Here is the test:
>
> [root@fc24 mounts]# cat run.sh
> set -e
> mount -t tmpfs zdtm /mnt
> mkdir -p /mnt/1 /mnt/2
> mount -t tmpfs zdtm /mnt/1
> mount --make-shared /mnt/1
> for i in `seq $1`; do
> mount --bind /mnt/1 `mktemp -d /mnt/1/test.XXXXXX`
> done
> mount --rbind /mnt/1 /mnt/2
> cat /proc/self/mountinfo | grep zdtm | wc -l
> time umount -l /mnt/1
> cat /proc/self/mountinfo | grep zdtm | wc -l
> umount /mnt/2
>
>
> [root@fc24 mounts]# unshare -Urm ./run.sh 5
> 65
>
> real 0m0.014s
> user 0m0.000s
> sys 0m0.004s
> 33
> umount: /mnt/2: target is busy
> (In some cases useful info about processes that
> use the device is found by lsof(8) or fuser(1).)
>
>>

Thanks,
Andrei