Re: [PATCH] user namespaces: bump idmap limits

From: Serge E. Hallyn
Date: Wed Oct 11 2017 - 16:30:51 EST

Next message: kernel test robot: "[rcu] 02d700d7e1: WARNING:at_kernel/rcu/tree.c:#rcu_dynticks_momentary_idle"
Previous message: Sean Paul: "Re: [PATCH 1/2] drm: vblank: use ktime_t instead of timeval"
Next in thread: Eric W. Biederman: "Re: [PATCH] user namespaces: bump idmap limits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Quoting Christian Brauner (christian.brauner@xxxxxxxxxx):
> We have quite some use cases where users already run into the current limit for
> {g,u}id mappings. Consider a user requesting us to map everything but 999, and
> 1001 for a given range of 1000000000 with a sub{g,u}id layout of:
>
> some-user:100000:1000000000
> some-user:999:1
> some-user:1000:1
> some-user:1001:1
> some-user:1002:1
>
> This translates to:
>
> MAPPING-TYPE CONTAINER HOST RANGE
> uid 999 999 1
> uid 1001 1001 1
> uid 0 1000000 999
> uid 1000 1001000 1
> uid 1002 1001002 999998998
>
> gid 999 999 1
> gid 1001 1001 1
> gid 0 1000000 999
> gid 1000 1001000 1
> gid 1002 1001002 999998998
>
> which is already the current limit.
>
> Design Notes:
> As discussed at LPC simply bumping the number of limits is not going to work
> since this would mean that struct uid_gid_map won't fit into a single cache-line
> anymore thereby regressing performance for the base-cases. The same problem
> seems to arise when using a single pointer. So the idea is to keep the base
> cases (0-3 mappings) directly in struct uid_gid_map so they fit into a single
> cache-line of 64 byte. For the two removed mappings we place three pointers in
> the struct that mock the behavior of traditional filesystems:
> 1. a direct pointer to a struct uid_gid_extent of 5 mappings of 60 bytes
> 2. an indirect pointer to an array of 64 byte of direct pointers to struct
> uid_gid_extent of 5 mappings a 60 bytes each
> 3. a double indirect pointer to an array of 64 bytes of indirect pointers each
> to an array of 64 bytes of direct pointers (and so on)
> Fixing a pointer size of 8 byte this gives us 3 + 5 + (8 * 5) + (8 * (8 * 5)) =
> 368 mappings which should really be enough. The idea of this approach is to
> always have each extent of idmaps (struct uid_gid_extent) be 60 bytes (5 * (4 +
> 4 + 4) and thus 4 bytes smaller than the size of a single cache line. This
> should only see a (i.e. linear) performance impact caused by iterating through
> the idmappings in a for-loop. Note that the base cases shouldn't see any
> performance degradation which is the most important part.
>
> Performance Testing:
> When Eric introduced the extent-based struct uid_gid_map approach he measured
> the performanc impact of his idmap changes:
>
> > My benchmark consisted of going to single user mode where nothing else was
> > running. On an ext4 filesystem opening 1,000,000 files and looping through all
> > of the files 1000 times and calling fstat on the individuals files. This was
> > to ensure I was benchmarking stat times where the inodes were in the kernels
> > cache, but the inode values were not in the processors cache. My results:
>
> > v3.4-rc1: ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
> > v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> > v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
>
> I used an identical approach on my laptop. Here's a thorough description of what
> I did. I built three kernels and used an additional "control" kernel:
>
> 1. v4.14-rc2-vanilla (unmodified v4.14-rc2)
> 2. v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> 3. v4.14-rc2-userns- (v4.14-rc2 without my new user namespace idmap limits patch)
> 4. v4.12.0-12-generic (v4.12.0-12 standard Ubuntu kernel)
>
> I booted into single user mode (systemd rescue target in newspeak) and used an
> ext4 filesystem to open/create 1,000,000 files. Then I looped through all of the
> files calling fstat() on each of them 1000 times and calculated the mean fstat()
> time for a single file. (The test program can be found below.)
>
> For kernels v4.14-rc2-vanilla, v4.12.0-12-generic I tested the following cases:
> 0 mappings
> 1 mapping
> 2 mappings
> 3 mappings
> 5 mappings
>
> For kernel v4.4-rc2-userns+ I tested:
> 0 mappings
> 1 mapping
> 2 mappings
> 3 mappings
> 5 mappings
> 10 mappings
> 50 mappings
> 100 mappings
> 200 mappings
> 300 mappings
>
> Here are the results:
>
> - v4.14-rc2-vanilla (unmodified v4.14-rc2)
> # no unshare: 312 ns
> unshare -U # write 0 mappings: 307 ns
> unshare -U # write 1 mappings: 328 ns
> unshare -U # write 2 mappings: 328 ns
> unshare -U # write 3 mappings: 328 ns
> unshare -U # write 5 mappings: 338 ns
>
> - v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> # no unshare: 158 ns
> unshare -U # write 0 mappings: 158 ns
> unshare -U # write 1 mappings: 164 ns
> unshare -U # write 2 mappings: 170 ns
> unshare -U # write 3 mappings: 175 ns
> unshare -U # write 5 mappings: 187 ns
> unshare -U # write 10 mappings: 218 ns
> unshare -U # write 50 mappings: 528 ns
> unshare -U # write 100 mappings: 980 ns
> unshare -U # write 200 mappings: 1880 ns
> unshare -U # write 300 mappings: 2760 ns

These numbers look very good.

Eric, does that address your performance concern?

(If so I'll take a closer look at the patch itself, I've not yet done so)

thanks,
-serge

Next message: kernel test robot: "[rcu] 02d700d7e1: WARNING:at_kernel/rcu/tree.c:#rcu_dynticks_momentary_idle"
Previous message: Sean Paul: "Re: [PATCH 1/2] drm: vblank: use ktime_t instead of timeval"
Next in thread: Eric W. Biederman: "Re: [PATCH] user namespaces: bump idmap limits"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]