Re: [PATCH] user namespaces: bump idmap limits

From: Serge E. Hallyn
Date: Wed Oct 04 2017 - 10:29:03 EST


Quoting Christian Brauner (christian.brauner@xxxxxxxxxx):
> We have quite some use cases where users already run into the current limit for
> {g,u}id mappings. Consider a user requesting us to map everything but 999, and
> 1001 for a given range of 1000000000 with a sub{g,u}id layout of:
>
> some-user:100000:1000000000
> some-user:999:1
> some-user:1000:1
> some-user:1001:1
> some-user:1002:1
>
> This translates to:
>
> MAPPING-TYPE CONTAINER HOST RANGE
> uid 999 999 1
> uid 1001 1001 1
> uid 0 1000000 999
> uid 1000 1001000 1
> uid 1002 1001002 999998998
>
> gid 999 999 1
> gid 1001 1001 1
> gid 0 1000000 999
> gid 1000 1001000 1
> gid 1002 1001002 999998998
>
> which is already the current limit.
>
> Design Notes:
> As discussed at LPC simply bumping the number of limits is not going to work
> since this would mean that struct uid_gid_map won't fit into a single cache-line
> anymore thereby regressing performance for the base-cases. The same problem
> seems to arise when using a single pointer. So the idea is to keep the base
> cases (0-3 mappings) directly in struct uid_gid_map so they fit into a single
> cache-line of 64 byte. For the two removed mappings we place three pointers in
> the struct that mock the behavior of traditional filesystems:
> 1. a direct pointer to a struct uid_gid_extent of 5 mappings of 60 bytes
> 2. an indirect pointer to an array of 64 byte of direct pointers to struct
> uid_gid_extent of 5 mappings a 60 bytes each
> 3. a double indirect pointer to an array of 64 bytes of indirect pointers each
> to an array of 64 bytes of direct pointers (and so on)
> Fixing a pointer size of 8 byte this gives us 3 + 5 + (8 * 5) + (8 * (8 * 5)) =
> 368 mappings which should really be enough. The idea of this approach is to
> always have each extent of idmaps (struct uid_gid_extent) be 60 bytes (5 * (4 +
> 4 + 4) and thus 4 bytes smaller than the size of a single cache line. This
> should only see a (i.e. linear) performance impact caused by iterating through
> the idmappings in a for-loop. Note that the base cases shouldn't see any
> performance degradation which is the most important part.

Sounds like a good plan.

> Performance Testing:
> When Eric introduced the extent-based struct uid_gid_map approach he measured
> the performanc impact of his idmap changes:
>
> > My benchmark consisted of going to single user mode where nothing else was
> > running. On an ext4 filesystem opening 1,000,000 files and looping through all
> > of the files 1000 times and calling fstat on the individuals files. This was
> > to ensure I was benchmarking stat times where the inodes were in the kernels
> > cache, but the inode values were not in the processors cache. My results:
>
> > v3.4-rc1: ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
> > v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> > v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
>
> I used an identical approach on my laptop. Here's a thorough description of what
> I did. I built three kernels and used an additional "control" kernel:
>
> 1. v4.14-rc2-vanilla (unmodified v4.14-rc2)
> 2. v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> 3. v4.14-rc2-userns- (v4.14-rc2 without my new user namespace idmap limits patch)

^ you mean *withYou your patch but with CONFIG_USER_NS=n ?

> 4. v4.12.0-12-generic (v4.12.0-12 standard Ubuntu kernel)

^ Just curious, why did you include this? To show that other factors have a much
larger impact? This does not include your patch, right?

>
> I booted into single user mode (systemd rescue target in newspeak) and used an
> ext4 filesystem to open/create 1,000,000 files. Then I looped through all of the
> files calling fstat() on each of them 1000 times and calculated the mean fstat()
> time for a single file. (The test program can be found below.)
>
> For kernels v4.14-rc2-vanilla, v4.12.0-12-generic I tested the following cases:
> 0 mappings
> 1 mapping
> 2 mappings
> 3 mappings
> 5 mappings
>
> For kernel v4.4-rc2-userns+ I tested:
> 0 mappings
> 1 mapping
> 2 mappings
> 3 mappings
> 5 mappings
> 10 mappings
> 50 mappings
> 100 mappings
> 200 mappings
> 300 mappings
>
> Here are the results:
>
> - v4.14-rc2-vanilla (unmodified v4.14-rc2)
> # no unshare: 312 ns
> unshare -U # write 0 mappings: 307 ns
> unshare -U # write 1 mappings: 328 ns
> unshare -U # write 2 mappings: 328 ns
> unshare -U # write 3 mappings: 328 ns
> unshare -U # write 5 mappings: 338 ns
>
> - v4.14-rc2-userns+ (v4.14-rc2 with my new user namespace idmap limits patch)
> # no unshare: 158 ns
> unshare -U # write 0 mappings: 158 ns
> unshare -U # write 1 mappings: 164 ns
> unshare -U # write 2 mappings: 170 ns
> unshare -U # write 3 mappings: 175 ns
> unshare -U # write 5 mappings: 187 ns
> unshare -U # write 10 mappings: 218 ns
> unshare -U # write 50 mappings: 528 ns
> unshare -U # write 100 mappings: 980 ns
> unshare -U # write 200 mappings: 1880 ns
> unshare -U # write 300 mappings: 2760 ns
>
> - v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
> # no unshare: 161 ns
>
> - 4.12.0-12-generic Ubuntu Kernel:
> # no unshare: 328 ns
> unshare -U # write 0 mappings: 327 ns
> unshare -U # write 1 mappings: 328 ns
> unshare -U # write 2 mappings: 328 ns
> unshare -U # write 3 mappings: 328 ns
> unshare -U # write 5 mappings: 338 ns
>

^ This is really weird. Why does Ubuntu kernel have near-constant (horrible)
time?

> I've tested this multiple times and the numbers hold up. All v4.14-rc2 kernels
> were built on the same machine with the same .config, the same options and a
> simple call to make -j 11 bindeb-pkg. The 4.12 kernel was simply installed from
> the Ubuntu archives.
>
> The most import part seems to me that my idmap patches don't regress performance
> for the base-cases. I'd actually only consider 0 and 1 mapping to be the proper

Agreed. Now personally I probably would have kept 4 direct pointers then make
the 5+ case hurt more, but I'm not saying that's the right thing.

(haven't looked at the patch itself yet)

thanks,
-serge