Re: [PATCH] user namespaces: bump idmap limits

From: Christian Brauner
Date: Wed Oct 04 2017 - 10:45:34 EST


On Wed, Oct 04, 2017 at 09:28:57AM -0500, Serge Hallyn wrote:
> Quoting Christian Brauner (christian.brauner@xxxxxxxxxx):
> > We have quite some use cases where users already run into the current limit for
> > {g,u}id mappings. Consider a user requesting us to map everything but 999, and
> > 1001 for a given range of 1000000000 with a sub{g,u}id layout of:
> >
> > some-user:100000:1000000000
> > some-user:999:1
> > some-user:1000:1
> > some-user:1001:1
> > some-user:1002:1
> >
> > This translates to:
> >
> > MAPPING-TYPE CONTAINER HOST RANGE
> > uid 999 999 1
> > uid 1001 1001 1
> > uid 0 1000000 999
> > uid 1000 1001000 1
> > uid 1002 1001002 999998998
> >
> > gid 999 999 1
> > gid 1001 1001 1
> > gid 0 1000000 999
> > gid 1000 1001000 1
> > gid 1002 1001002 999998998
> >
> > which is already the current limit.
> >
> > Design Notes:
> > As discussed at LPC simply bumping the number of limits is not going to work
> > since this would mean that struct uid_gid_map won't fit into a single cache-line
> > anymore thereby regressing performance for the base-cases. The same problem
> > seems to arise when using a single pointer. So the idea is to keep the base
> > cases (0-3 mappings) directly in struct uid_gid_map so they fit into a single
> > cache-line of 64 byte. For the two removed mappings we place three pointers in
> > the struct that mock the behavior of traditional filesystems:
> > 1. a direct pointer to a struct uid_gid_extent of 5 mappings of 60 bytes
> > 2. an indirect pointer to an array of 64 byte of direct pointers to struct
> > uid_gid_extent of 5 mappings a 60 bytes each
> > 3. a double indirect pointer to an array of 64 bytes of indirect pointers each
> > to an array of 64 bytes of direct pointers (and so on)
> > Fixing a pointer size of 8 byte this gives us 3 + 5 + (8 * 5) + (8 * (8 * 5)) =
> > 368 mappings which should really be enough. The idea of this approach is to
> > always have each extent of idmaps (struct uid_gid_extent) be 60 bytes (5 * (4 +
> > 4 + 4) and thus 4 bytes smaller than the size of a single cache line. This
> > should only see a (i.e. linear) performance impact caused by iterating through
> > the idmappings in a for-loop. Note that the base cases shouldn't see any
> > performance degradation which is the most important part.
>
> Sounds like a good plan.
>
> > Performance Testing:
> > When Eric introduced the extent-based struct uid_gid_map approach he measured
> > the performanc impact of his idmap changes:
> >
> > > My benchmark consisted of going to single user mode where nothing else was
> > > running. On an ext4 filesystem opening 1,000,000 files and looping through all
> > > of the files 1000 times and calling fstat on the individuals files. This was
> > > to ensure I was benchmarking stat times where the inodes were in the kernels
> > > cache, but the inode values were not in the processors cache. My results:
> >
> > > v3.4-rc1: ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
> > > v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> > > v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
> >
> > I used an identical approach on my laptop. Here's a thorough description of what
> > I did. I built three kernels and used an additional "control" kernel:
> >
> > 1. v4.14-rc2-vanilla (unmodified v4.14-rc2)
> > 2. v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> > 3. v4.14-rc2-userns- (v4.14-rc2 without my new user namespace idmap limits patch)
>
> ^ you mean *withYou your patch but with CONFIG_USER_NS=n ?

Yes, exactly. Sorry, that was unclear here.

>
> > 4. v4.12.0-12-generic (v4.12.0-12 standard Ubuntu kernel)
>
> ^ Just curious, why did you include this? To show that other factors have a much
> larger impact? This does not include your patch, right?

Basically I wanted something which I didn't compile and see if the numbers
somehow line-up. In terms of experimentation you could think of this as a second
"control condition".

>
> >
> > I booted into single user mode (systemd rescue target in newspeak) and used an
> > ext4 filesystem to open/create 1,000,000 files. Then I looped through all of the
> > files calling fstat() on each of them 1000 times and calculated the mean fstat()
> > time for a single file. (The test program can be found below.)
> >
> > For kernels v4.14-rc2-vanilla, v4.12.0-12-generic I tested the following cases:
> > 0 mappings
> > 1 mapping
> > 2 mappings
> > 3 mappings
> > 5 mappings
> >
> > For kernel v4.4-rc2-userns+ I tested:
> > 0 mappings
> > 1 mapping
> > 2 mappings
> > 3 mappings
> > 5 mappings
> > 10 mappings
> > 50 mappings
> > 100 mappings
> > 200 mappings
> > 300 mappings
> >
> > Here are the results:
> >
> > - v4.14-rc2-vanilla (unmodified v4.14-rc2)
> > # no unshare: 312 ns
> > unshare -U # write 0 mappings: 307 ns
> > unshare -U # write 1 mappings: 328 ns
> > unshare -U # write 2 mappings: 328 ns
> > unshare -U # write 3 mappings: 328 ns
> > unshare -U # write 5 mappings: 338 ns
> >
> > - v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> > # no unshare: 158 ns
> > unshare -U # write 0 mappings: 158 ns
> > unshare -U # write 1 mappings: 164 ns
> > unshare -U # write 2 mappings: 170 ns
> > unshare -U # write 3 mappings: 175 ns
> > unshare -U # write 5 mappings: 187 ns
> > unshare -U # write 10 mappings: 218 ns
> > unshare -U # write 50 mappings: 528 ns
> > unshare -U # write 100 mappings: 980 ns
> > unshare -U # write 200 mappings: 1880 ns
> > unshare -U # write 300 mappings: 2760 ns
> >
> > - v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> > # no unshare: 161 ns
> >
> > - 4.12.0-12-generic Ubuntu Kernel:
> > # no unshare: 328 ns
> > unshare -U # write 0 mappings: 327 ns
> > unshare -U # write 1 mappings: 328 ns
> > unshare -U # write 2 mappings: 328 ns
> > unshare -U # write 3 mappings: 328 ns
> > unshare -U # write 5 mappings: 338 ns
> >
>
> ^ This is really weird. Why does Ubuntu kernel have near-constant (horrible)
> time?

I actually think - even in single user mode - with the same number of processes
running and so on - that there's a lot of fluctuation going on. That's why I ran
the tests multiple times. It might also depend on compilation since I compiled
the three kernels myself and just downloaded the binaries for the ubuntu kernel.
The tests clearly show that there's an increase with the number of mappings
which is what I expected.

>
> > I've tested this multiple times and the numbers hold up. All v4.14-rc2 kernels
> > were built on the same machine with the same .config, the same options and a
> > simple call to make -j 11 bindeb-pkg. The 4.12 kernel was simply installed from
> > the Ubuntu archives.
> >
> > The most import part seems to me that my idmap patches don't regress performance
> > for the base-cases. I'd actually only consider 0 and 1 mapping to be the proper
>
> Agreed. Now personally I probably would have kept 4 direct pointers then make
> the 5+ case hurt more, but I'm not saying that's the right thing.

Yeah, I thought about that as well but my goal was to basically ramp up the
number of mappings into the hundreds to settle this "once and for all". I
actually don't expect us to go any higher than this. Tbh, users that have a
requirement to have many mappings should be prepared to take the performance
hit. Also, I think that the direct pointers won't necessarily give you more
speed since - I'd guess - that the slowdown simply comes from the number of
iterations through the map you have to do and not necessarily from cache misses.
But I might be thinking nonsense here. Thanks!

>
> (haven't looked at the patch itself yet)
>
> thanks,
> -serge